vit_base-224-in21k-ft-cifar100 Open Source Image Classification Model - Precise Image Recognition with an Accuracy Exceeding 91%!

Vit Base 224 In21k Ft Cifar100

Developed by edumunozsala

An image classification model based on the Vision Transformer architecture, fine-tuned on the CIFAR-100 dataset with an accuracy of 91.48%

Image Classification

Transformers

SpanishOpen Source License:Apache-2.0 #High-precision image classification #ViT fine-tuned model #Specialized for CIFAR100

Downloads 357

Release Time : 6/11/2022

Model Overview

This model uses the Vision Transformer architecture, pre-trained on ImageNet-21k and fine-tuned on the CIFAR-100 dataset, specifically designed for image classification tasks.

Model Features

High accuracy

Achieves 91.48% accuracy on the CIFAR-100 test set

Transformer-based architecture

Utilizes the Vision Transformer architecture, suitable for image processing tasks

Pre-trained + fine-tuned

Pre-trained on ImageNet-21k and fine-tuned on CIFAR-100

Model Capabilities

Image classification

Feature extraction

Use Cases

Computer vision

Object recognition

Identify object categories in images

Performs well on CIFAR-100's 100 categories

Image classification system

Build an automated image classification system

Can be used for product categorization, content moderation, etc.

🚀 Model vit_base-224-in21k-ft-cifar100

A finetuned model for image classification, trained using Amazon SageMaker and Hugging Face's Deep Learning container.

🚀 Quick Start

This model is a fine - tuned version for image classification. It was trained using Amazon SageMaker and the Hugging Face Deep Learning container. The base model is the Vision Transformer (base - sized model), a transformer encoder model (BERT - like) pretrained on a large collection of images in a supervised fashion, specifically ImageNet - 21k, at a resolution of 224x224 pixels. Link to base model

✨ Features

Fine - tuned for Image Classification: Specifically optimized for the task of image classification.
Trained with SageMaker and Hugging Face: Utilizes the power of Amazon SageMaker and Hugging Face's Deep Learning container for training.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import ViTFeatureExtractor, ViTModel
from PIL import Image
import requests

url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = ViTFeatureExtractor.from_pretrained('google/vit-base-patch16-224-in21k')
model = ViTModel.from_pretrained('edumunozsala/vit_base-224-in21k-ft-cifar100')
inputs = feature_extractor(images=image, return_tensors="pt")

outputs = model(**inputs)
last_hidden_states = outputs.last_hidden_state

📚 Documentation

Base model citation

BibTeX entry and citation info

@misc{wu2020visual,
      title={Visual Transformers: Token-based Image Representation and Processing for Computer Vision}, 
      author={Bichen Wu and Chenfeng Xu and Xiaoliang Dai and Alvin Wan and Peizhao Zhang and Zhicheng Yan and Masayoshi Tomizuka and Joseph Gonzalez and Kurt Keutzer and Peter Vajda},
      year={2020},
      eprint={2006.03677},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Dataset

Link to dataset description

The CIFAR - 10 and CIFAR - 100 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.

The CIFAR - 10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. This dataset, CIFAR100, is similar to CIFAR - 10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR - 100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs).

Sizes of datasets:

Train dataset: 50,000
Test dataset: 10,000

Intended uses & limitations

This model is intended for Image Classification.

Hyperparameters

{
    "epochs": "5",
    "train_batch_size": "32",    
    "eval_batch_size": "8",
    "fp16": "true",
    "learning_rate": "1e-05"
}

Test results

Accuracy = 0.9148

🔧 Technical Details

The model is based on the Vision Transformer architecture. It is a fine - tuned version of the base model pretrained on ImageNet - 21k. The fine - tuning was carried out using Amazon SageMaker and the Hugging Face Deep Learning container.

📄 License

This model is released under the apache - 2.0 license.

📊 Model Information

Property	Details
Model Type	Vision Transformer (fine - tuned for image classification)
Training Data	Cifar100
Metrics	Accuracy
Base Model	Vision Transformer (base - sized model) pretrained on ImageNet - 21k
Results	Accuracy on Cifar100: 0.9148

Created by Eduardo Muñoz/@edumunozsala

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご