🚀 Model vit_base-224-in21k-ft-cifar100
A finetuned model for image classification, trained using Amazon SageMaker and Hugging Face's Deep Learning container.
🚀 Quick Start
This model is a fine - tuned version for image classification. It was trained using Amazon SageMaker and the Hugging Face Deep Learning container. The base model is the Vision Transformer (base - sized model), a transformer encoder model (BERT - like) pretrained on a large collection of images in a supervised fashion, specifically ImageNet - 21k, at a resolution of 224x224 pixels. Link to base model
✨ Features
- Fine - tuned for Image Classification: Specifically optimized for the task of image classification.
- Trained with SageMaker and Hugging Face: Utilizes the power of Amazon SageMaker and Hugging Face's Deep Learning container for training.
📦 Installation
No specific installation steps are provided in the original document.
💻 Usage Examples
Basic Usage
from transformers import ViTFeatureExtractor, ViTModel
from PIL import Image
import requests
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)
feature_extractor = ViTFeatureExtractor.from_pretrained('google/vit-base-patch16-224-in21k')
model = ViTModel.from_pretrained('edumunozsala/vit_base-224-in21k-ft-cifar100')
inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
last_hidden_states = outputs.last_hidden_state
📚 Documentation
Base model citation
BibTeX entry and citation info
@misc{wu2020visual,
title={Visual Transformers: Token-based Image Representation and Processing for Computer Vision},
author={Bichen Wu and Chenfeng Xu and Xiaoliang Dai and Alvin Wan and Peizhao Zhang and Zhicheng Yan and Masayoshi Tomizuka and Joseph Gonzalez and Kurt Keutzer and Peter Vajda},
year={2020},
eprint={2006.03677},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Dataset
Link to dataset description
The CIFAR - 10 and CIFAR - 100 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.
The CIFAR - 10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. This dataset, CIFAR100, is similar to CIFAR - 10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR - 100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs).
Sizes of datasets:
- Train dataset: 50,000
- Test dataset: 10,000
Intended uses & limitations
This model is intended for Image Classification.
Hyperparameters
{
"epochs": "5",
"train_batch_size": "32",
"eval_batch_size": "8",
"fp16": "true",
"learning_rate": "1e-05"
}
Test results
🔧 Technical Details
The model is based on the Vision Transformer architecture. It is a fine - tuned version of the base model pretrained on ImageNet - 21k. The fine - tuning was carried out using Amazon SageMaker and the Hugging Face Deep Learning container.
📄 License
This model is released under the apache - 2.0 license.
📊 Model Information
Property |
Details |
Model Type |
Vision Transformer (fine - tuned for image classification) |
Training Data |
Cifar100 |
Metrics |
Accuracy |
Base Model |
Vision Transformer (base - sized model) pretrained on ImageNet - 21k |
Results |
Accuracy on Cifar100: 0.9148 |
Created by Eduardo Muñoz/@edumunozsala