Vit-base-patch16-224-in21k-snacks Open-source Model - Free Deployment for Precise Classification of Snack Images

Home

Vit Base Patch16 224 In21k Snacks

Developed by matteopilotto

A Vision Transformer model pre-trained on ImageNet-21k and fine-tuned specifically for snack image classification tasks

Image Classification

Transformers

#Snack Image Classification #ViT High Accuracy #Data Augmentation Optimization

Downloads 37

Release Time : 5/14/2022

Model Overview

This model is a Vision Transformer pre-trained on ImageNet-21k and fine-tuned on the Matthijs/snacks dataset, specifically designed for snack image classification tasks.

Model Features

High Accuracy Classification

Achieves 89.29% accuracy on the snack test set

Data Augmentation

Utilizes various data augmentation techniques including random cropping, horizontal flipping, and sharpness adjustment

Transfer Learning

Fine-tuned based on the large-scale ImageNet-21k pre-trained model

Model Capabilities

Snack Image Classification

Food Recognition

Visual Feature Extraction

Use Cases

Retail & Food Service

Automatic Checkout System

Used in supermarkets to automatically identify snack products selected by customers

Can replace manual scanning, improving checkout efficiency

Food Inventory Management

Automatically identifies snack products on shelves

Helps monitor inventory in real-time

Health & Nutrition

Diet Tracking App

Automatically records snacks consumed by users through photos

Helps users track their eating habits

🚀 Vision Transformer fine-tuned on `Matthijs/snacks` dataset

A Vision Transformer (ViT) model pre - trained on ImageNet - 21k and fine - tuned on the Matthijs/snacks dataset for enhanced image classification.

This Vision Transformer (ViT) model was initially pre - trained on ImageNet - 21k. Then, it was fine - tuned on the Matthijs/snacks dataset for 5 epochs. During the fine - tuning process, various data augmentation transformations from torchvision were applied. The model has achieved an accuracy of 94.97% on the validation set and 94.43% on the test set.

📚 Documentation

Datasets

Matthijs/snacks

Model Index

Property	Details
Model Name	matteopilotto/vit - base - patch16 - 224 - in21k - snacks
Task Type	Image Classification
Dataset Name	Matthijs/snacks
Dataset Split	test
Accuracy	0.8928571428571429
Precision Macro	0.8990033704680036
Precision Micro	0.8928571428571429
Precision Weighted	0.8972398709051788
Recall Macro	0.8914608843537415
Recall Micro	0.8928571428571429
Recall Weighted	0.8928571428571429
F1 Macro	0.892544821273258
F1 Micro	0.8928571428571429
F1 Weighted	0.8924168605019522
Loss	0.479541540145874

💻 Usage Examples

Basic Usage

The following code block demonstrates the data augmentation pipeline used during pre - processing to augment the original dataset. The augmented images were generated on - the - fly with the set_transform method.

from transformers import ViTFeatureExtractor
from torchvision.transforms import (
    Compose,
    Normalize,
    Resize,
    RandomResizedCrop,
    RandomHorizontalFlip,
    RandomAdjustSharpness,
    ToTensor
)

checkpoint = 'google/vit-base-patch16-224-in21k'
feature_extractor = ViTFeatureExtractor.from_pretrained(checkpoint)

# transformations on the training set
train_aug_transforms = Compose([
    RandomResizedCrop(size=feature_extractor.size),
    RandomHorizontalFlip(p=0.5),
    RandomAdjustSharpness(sharpness_factor=5, p=0.5),
    ToTensor(),
    Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std),
])

# transformations on the validation/test set
valid_aug_transforms = Compose([
    Resize(size=(feature_extractor.size, feature_extractor.size)),
    ToTensor(),
    Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std),
])

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご