Open-source Cat and Dog Image Classification Model: vit-base-patch16-224-in21k - with an Accuracy of up to 99%, It's Great!

Vit Base Patch16 224 In21k Dog Vs Cat Image Classification

Developed by DunnBC22

A cat and dog image classification model fine-tuned based on Google Vision Transformer (ViT) architecture, achieving 99% accuracy on the test set

Image Classification

Transformers

EnglishOpen Source License:Apache-2.0 #High-precision image classification #ViT fine-tuned model #Pet recognition

Downloads 20

Release Time : 1/11/2023

Model Overview

This is a binary classification model for distinguishing between cats and dogs, fine-tuned based on a pre-trained ViT model, suitable for simple image classification tasks

Model Features

High accuracy

Achieves 99% accuracy and 0.9897 F1 score on cat-dog classification tasks

Based on ViT architecture

Uses the Vision Transformer base architecture, suitable for image classification tasks

Lightweight fine-tuning

Only requires 3 training epochs to achieve high performance, with a learning rate of 0.0002

Model Capabilities

Image classification

Binary classification

Animal recognition

Use Cases

Pet recognition

Cat-dog classification

Automatically identifies whether the image contains a cat or a dog

99% accuracy

Content management

Pet image classification

Automatically classifies uploaded cat and dog images for pet picture websites

🚀 vit-base-patch16-224-in21k_dog_vs_cat_image_classification

This model is a fine - tuned image - classification model, which can effectively distinguish between cats and dogs. It is based on the pre - trained model [google/vit - base - patch16 - 224 - in21k](https://huggingface.co/google/vit - base - patch16 - 224 - in21k), achieving high accuracy on the evaluation set.

🚀 Quick Start

This model is a fine - tuned version of [google/vit - base - patch16 - 224 - in21k](https://huggingface.co/google/vit - base - patch16 - 224 - in21k). It achieves the following results on the evaluation set:

Loss: 0.0404
Accuracy: 0.99
F1: 0.9897
Recall: 0.9909
Precision: 0.9885

✨ Features

Binary Classification: Specifically designed for distinguishing between cats and dogs.
High Performance: Achieves high accuracy, F1 score, recall, and precision on the evaluation set.

📚 Documentation

Model description

This is a binary classification model to distinguish between cats and dogs.

For more information on how it was created, check out the following link: https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Classification/Binary%20Classification/Dogs%20or%20Cats%20Image%20Classification/Dog_v_Cat_ViT.ipynb

Intended uses & limitations

This model is intended to demonstrate my ability to solve a complex problem using technology.

Training and evaluation data

Dataset Source: https://www.kaggle.com/datasets/shaunthesheep/microsoft - catsvsdogs - dataset

Sample Images From Dataset:

Sample Images

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1	Recall	Precision
0.0896	1.0	1250	0.0590	0.979	0.9783	0.9728	0.9838
0.0253	2.0	2500	0.0543	0.9842	0.9837	0.9802	0.9871
0.0066	3.0	3750	0.0404	0.99	0.9897	0.9909	0.9885

Framework versions

Transformers 4.25.1
Pytorch 1.12.1
Datasets 2.8.0
Tokenizers 0.12.1

📄 License

This model is licensed under the Apache 2.0 license.

📦 Information Table

Property	Details
Model Type	Fine - tuned image - classification model
Training Data	From https://www.kaggle.com/datasets/shaunthesheep/microsoft - catsvsdogs - dataset
Metrics	Accuracy, F1, Recall, Precision
Evaluation Results	Loss: 0.0404, Accuracy: 0.99, F1: 0.9897, Recall: 0.9909, Precision: 0.9885

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご