đ vit-artworkclassifier
This model is designed to identify the artwork style of any input image, offering a practical solution for image classification tasks in the field of art. It is a fine - tuned version of [google/vit - base - patch16 - 224 - in21k](https://huggingface.co/google/vit - base - patch16 - 224 - in21k) on the imagefolder dataset, a subset of the artbench - 10 dataset.
đ Quick Start
This model returns the artwork style of any image input. It's a fine - tuned version of [google/vit - base - patch16 - 224 - in21k](https://huggingface.co/google/vit - base - patch16 - 224 - in21k) on the imagefolder dataset. This subset of the artbench - 10 dataset (https://www.kaggle.com/datasets/alexanderliao/artbench10) has a train set of 1000 artworks per class and a validation set of 100 artworks per class. It achieves the following results on the evaluation set:
- Loss: 1.1392
- Accuracy: 0.5948
⨠Features
- Artwork Style Classification: Accurately predicts the artwork style of input images.
- Fine - Tuned Model: Based on a well - known base model and fine - tuned on a specific dataset for better performance.
đ Documentation
Model description
You can find a description of the project that this model was trained for here: https://medium.com/@oliverpj.schamp/training - and - evaluating - stable - diffusion - for - artwork - generation - b099d1f5b7a6
Intended uses & limitations
This model only contains 9 out of the 10 artbench - 10 classes - it does not contain ukiyo_e. This was due to availability and formatting issues.
Training and evaluation data
Train: 1000 randomly selected images from artbench - 10 (per class). Val: 100 randomly selected images from artbench - 10 (per class).
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 32
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- num_epochs: 4
- mixed_precision_training: Native AMP
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Accuracy |
1.5906 |
0.36 |
100 |
1.4709 |
0.4847 |
1.3395 |
0.72 |
200 |
1.3208 |
0.5074 |
1.1461 |
1.08 |
300 |
1.3363 |
0.5165 |
0.9593 |
1.44 |
400 |
1.1790 |
0.5846 |
0.8761 |
1.8 |
500 |
1.1252 |
0.5902 |
0.5922 |
2.16 |
600 |
1.1392 |
0.5948 |
0.4803 |
2.52 |
700 |
1.1560 |
0.5936 |
0.4454 |
2.88 |
800 |
1.1545 |
0.6118 |
0.2271 |
3.24 |
900 |
1.2284 |
0.6039 |
0.207 |
3.6 |
1000 |
1.2625 |
0.5959 |
0.1958 |
3.96 |
1100 |
1.2621 |
0.6005 |
Framework versions
- Transformers 4.26.1
- Pytorch 1.13.1+cu117
- Datasets 2.9.0
- Tokenizers 0.13.2
đģ Usage Examples
Basic Usage
def vit_classify(image):
vit = ViTForImageClassification.from_pretrained("oschamp/vit-artworkclassifier")
vit.eval()
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
vit.to(device)
model_name_or_path = 'google/vit-base-patch16-224-in21k'
feature_extractor = ViTFeatureExtractor.from_pretrained(model_name_or_path)
encoding = feature_extractor(images=image, return_tensors="pt")
encoding.keys()
pixel_values = encoding['pixel_values'].to(device)
outputs = vit(pixel_values)
logits = outputs.logits
prediction = logits.argmax(-1)
return prediction.item()
đ§ Technical Details
The model is fine - tuned on the imagefolder dataset, which is a subset of the artbench - 10 dataset. The fine - tuning process involves adjusting the hyperparameters to optimize the model's performance on the specific task of artwork style classification. The base model [google/vit - base - patch16 - 224 - in21k](https://huggingface.co/google/vit - base - patch16 - 224 - in21k) provides a solid foundation, and the fine - tuning helps it adapt to the characteristics of the artbench - 10 dataset.
đ License
This model is licensed under the Apache - 2.0 license.
đĻ Model Information
Property |
Details |
Model Type |
Fine - tuned version of google/vit - base - patch16 - 224 - in21k for image classification |
Training Data |
Subset of artbench - 10 dataset (imagefolder), with 1000 training images and 100 validation images per class |
Metrics |
Accuracy: 0.5948 on the evaluation set |
Base Model |
google/vit - base - patch16 - 224 - in21k |