Swinv2-Tiny Open-Source Visual Classification Model - Free Deployment for Accurate Identification of Thai Food Images

Swinv2 Tiny Patch4 Window8 256 Finetuned THFOOD 50

Developed by thean

This model is a vision classification model fine-tuned on the THFOOD-50 Thai food dataset based on the Swin Transformer V2 architecture, specifically designed for Thai food image recognition.

Image Classification

Transformers

#Thai Food Recognition #High-precision Classification #SwinV2 Architecture

Downloads 30

Release Time : 4/22/2023

Model Overview

Based on the microsoft/swinv2-tiny-patch4-window8-256 pre-trained model, fine-tuned on the THFOOD-50 dataset for Thai food image classification tasks.

Model Features

High Accuracy

Achieves 92.92% accuracy on the THFOOD-50 test set and 93.44% accuracy on the validation set.

Efficient Training

Uses the relatively small SwinV2-tiny architecture for high training efficiency.

Specialized Food Recognition

Optimized for Thai cuisine, capable of recognizing 50 types of Thai specialty dishes.

Model Capabilities

Thai Food Image Classification

Food Recognition

Image Feature Extraction

Use Cases

Food & Beverage Industry

Smart Menu System

Automatically identifies food photos taken by customers and recommends similar dishes.

Food Logging App

Helps users record and identify the Thai food they consume.

Education & Research

Thai Culinary Culture Research

Automatically classifies and analyzes Thai traditional food image data.

🚀 swinv2-tiny-patch4-window8-256-finetuned-THFOOD-50

This model is a fine - tuned version of microsoft/swinv2-tiny-patch4-window8-256 on the THFOOD-50 dataset. It can effectively perform tasks on the THFOOD - 50 dataset, achieving high accuracy in food image classification.

🚀 Quick Start

This model can be used directly through the Hugging Face Transformers library. You can load the model and perform inference tasks.

✨ Features

Fine - tuned: Based on the pre - trained model [microsoft/swinv2-tiny-patch4-window8-256], it is fine - tuned on the THFOOD-50 dataset, improving the accuracy of food image classification.
High accuracy: Achieves high accuracy on the training, validation, and test sets.

📦 Installation

Since this model is based on the Transformers library, you can install the necessary libraries using the following command:

pip install transformers datasets torch

💻 Usage Examples

Here is a simple example of using this model for inference:

from transformers import AutoFeatureExtractor, AutoModelForImageClassification
import torch
from PIL import Image
import requests

# Load feature extractor and model
feature_extractor = AutoFeatureExtractor.from_pretrained("your_model_name")
model = AutoModelForImageClassification.from_pretrained("your_model_name")

# Load an image
url = 'https://huggingface.co/datasets/thean/sample_images/resolve/main/FriedChicken.jpg'
image = Image.open(requests.get(url, stream=True).raw)

# Preprocess the image
inputs = feature_extractor(images=image, return_tensors="pt")

# Perform inference
with torch.no_grad():
    logits = model(**inputs).logits

# Get the predicted label
predicted_label = logits.argmax(-1).item()
print(model.config.id2label[predicted_label])

📚 Documentation

Model Performance

This model achieves the following results on different datasets:

Train set
- Loss: 0.1669
- Accuracy: 0.9557
Validation set
- Loss: 0.2535
- Accuracy: 0.9344
Test set
- Loss: 0.2669
- Accuracy: 0.9292

Training and Evaluation Data

The model is trained and evaluated on the THFOOD-50 dataset. However, more detailed information about the data is yet to be provided.

Training Procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e - 05
train_batch_size: 64
eval_batch_size: 64
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 256
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
3.6558	0.99	47	3.1956	0.28
1.705	1.99	94	1.1701	0.6787
0.9805	2.98	141	0.6492	0.8125
0.7925	4.0	189	0.4724	0.8644
0.6169	4.99	236	0.4129	0.8738
0.5343	5.99	283	0.3717	0.8825
0.5196	6.98	330	0.3654	0.8906
0.5059	8.0	378	0.3267	0.8969
0.4432	8.99	425	0.2996	0.9081
0.3819	9.99	472	0.3056	0.9087
0.3627	10.98	519	0.2796	0.9213
0.3505	12.0	567	0.2753	0.915
0.3224	12.99	614	0.2830	0.9206
0.3206	13.99	661	0.2797	0.9231
0.3141	14.98	708	0.2569	0.9287
0.2946	16.0	756	0.2582	0.9319
0.3008	16.99	803	0.2583	0.9337
0.2356	17.99	850	0.2567	0.9281
0.2954	18.98	897	0.2581	0.9319
0.2628	19.89	940	0.2535	0.9344

Framework versions

Transformers 4.28.1
Pytorch 2.0.0+cu118
Datasets 2.11.0
Tokenizers 0.13.3

📄 License

This model is licensed under the afl - 3.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご