Open-source ViT fine-tuned Food101 model - Free deployment for accurate food image classification

Home

Vit Finetuned Food101

Developed by ashaduzzaman

This is a Vision Transformer model fine-tuned on the Food-101 dataset for food image classification tasks.

Image Classification

TensorBoard

Open Source License:Apache-2.0 #Food Image Classification #High-precision ViT #Dining Scenarios

Downloads 162

Release Time : 8/28/2024

Model Overview

Based on Google's ViT architecture, this model is specifically optimized for 101 food categories, suitable for scenarios like diet tracking and restaurant menu analysis.

Model Features

High-Accuracy Food Classification

Achieves 89.6% accuracy on the Food-101 test set, capable of accurately identifying 101 different food categories.

ViT-Based Architecture

Utilizes the Vision Transformer architecture with self-attention mechanisms to capture global image features.

Transfer Learning Optimization

Fine-tuned from a pre-trained ViT model, effectively leveraging features learned from large-scale image data.

Model Capabilities

Food Image Classification

Multi-category Recognition

Diet Analysis

Use Cases

Diet & Health

Automatic Food Logging

Helps users automatically record dietary content by taking photos

Accurately identifies 101 common food items

Food & Beverage Industry

Menu Analysis

Automatically analyzes food categories in restaurant menus

🚀 ViT Fine-tuned on Food-101

This model is based on the Vision Transformer (ViT) architecture, fine-tuned on the Food-101 dataset for image classification tasks, especially for food item recognition and categorization.

🚀 Quick Start

To run inference using this model, you can load an image from the Food-101 dataset and classify it as follows:

from datasets import load_dataset
from transformers import pipeline
from PIL import Image
import requests
from io import BytesIO

# Load a sample image from the internet
image_url = "https://example.com/path-to-your-image.jpg"  # Replace with your image URL
response = requests.get(image_url)
image = Image.open(BytesIO(response.content))

# Load the fine-tuned model for image classification
classifier = pipeline(
    "image-classification",
    model="ashaduzzaman/vit-finetuned-food101"
)

# Run inference
result = classifier(image)
print(result)

✨ Features

High Accuracy: Achieves an accuracy of 89.6% on the evaluation set.
Specific Task: Designed specifically for classifying images into one of 101 food categories.

📦 Installation

The README does not provide installation steps, so this section is skipped.

💻 Usage Examples

Basic Usage

from datasets import load_dataset
from transformers import pipeline
from PIL import Image
import requests
from io import BytesIO

# Load a sample image from the internet
image_url = "https://example.com/path-to-your-image.jpg"  # Replace with your image URL
response = requests.get(image_url)
image = Image.open(BytesIO(response.content))

# Load the fine-tuned model for image classification
classifier = pipeline(
    "image-classification",
    model="ashaduzzaman/vit-finetuned-food101"
)

# Run inference
result = classifier(image)
print(result)

Advanced Usage

The README does not provide advanced usage examples, so this part is not added.

📚 Documentation

Model Overview

This model is a fine-tuned version of google/vit-base-patch16-224-in21k on the Food-101 dataset. The Vision Transformer (ViT) architecture is leveraged for image classification tasks, particularly for recognizing and categorizing food items.

Model Details

Property	Details
Model Type	Vision Transformer (ViT)
Base Model	google/vit-base-patch16-224-in21k
Fine-tuning Dataset	Food-101
Number of Labels	101 (corresponding to different food categories)

Performance

The model achieves the following results on the evaluation set:

Loss: 1.6262
Accuracy: 89.6%

Intended Uses & Limitations

Intended Use Cases

Image Classification: This model is designed for classifying images into one of 101 food categories, making it suitable for applications like food recognition in diet tracking, restaurant menu analysis, or food-related search engines.

Limitations

Dataset Bias: The model's performance may degrade when applied to food images that are significantly different from those in the Food-101 dataset, such as non-Western cuisines or images captured in non-standard conditions.
Generalization: While the model performs well on the Food-101 dataset, its ability to generalize to other food-related tasks or datasets is not guaranteed.
Input Size: The model expects input images of size 224x224 pixels. Images of different sizes should be resized accordingly.

Training and Evaluation Data

The model was fine-tuned on the Food-101 dataset, which consists of 101,000 images across 101 different food categories. Each category contains 1,000 images, with 750 used for training and 250 for testing. The dataset includes diverse food items but may be skewed towards certain cuisines or food types.

Training Procedure

Training Hyperparameters

The model was fine-tuned using the following hyperparameters:

Learning Rate: 5e-05
Train Batch Size: 16
Eval Batch Size: 16
Seed: 42
Gradient Accumulation Steps: 4
Total Train Batch Size: 64
Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
Learning Rate Scheduler: Linear with a warmup ratio of 0.1
Number of Epochs: 3

Training Results

Training Loss	Epoch	Step	Validation Loss	Accuracy
2.7649	0.992	62	2.5733	0.831
1.888	2.0	125	1.7770	0.883
1.6461	2.976	186	1.6262	0.896

Framework Versions

Transformers: 4.42.4
PyTorch: 2.4.0+cu121
Datasets: 2.21.0
Tokenizers: 0.19.1

Ethical Considerations

Bias: The Food-101 dataset primarily consists of popular Western dishes, which may introduce bias in the model’s predictions for non-Western food items.
Privacy: When using this model in applications, ensure that the images are sourced ethically and that privacy considerations are respected.

Citation

If you use this model in your work, please cite it as:

@misc{vit_finetuned_food101,
  author = {Ashaduzzaman},
  title = {ViT Fine-tuned on Food-101},
  year = {2024},
  url = {https://huggingface.co/ashaduzzaman/vit-finetuned-food101},
}

🔧 Technical Details

The README does not provide in - depth technical details (more than 50 words of specific technical explanations), so this section is skipped.

📄 License

The model is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご