vit-base-patch16-224-Futurama_Image_multilabel_clf Open Source Model - Precisely Identify the Content of "Futurama" Screenshots

Vit Base Patch16 224 Futurama Image Multilabel Clf

Developed by DunnBC22

A multi-label image classification model fine-tuned based on Google Vision Transformer, specifically designed to identify content in screenshots from the animated series 'Futurama'.

Image Classification

Transformers

EnglishOpen Source License:Apache-2.0 #Animated Screenshot Classification #Multi-label Recognition #High-precision ViT

Downloads 19

Release Time : 2/16/2023

Model Overview

This model is a fine-tuned version of google/vit-base-patch16-224, used for multi-label classification of 'Futurama' animated screenshots. It performs excellently on the evaluation set, achieving an F1 score of 0.9818.

Model Features

High-precision Multi-label Classification

Achieved an F1 score of 0.9818 and an accuracy of 0.9672 on the 'Futurama' screenshot dataset.

Based on ViT Architecture

Utilizes the Vision Transformer base architecture, featuring powerful image feature extraction capabilities.

Fine-tuned

Underwent 8 rounds of fine-tuning, reducing training loss from 0.2456 to 0.0005.

Model Capabilities

Image Classification

Multi-label Recognition

Animated Scene Analysis

Use Cases

Media Content Analysis

Animated Scene Classification

Automatically identifies scene content in 'Futurama' animations

Accuracy rate of 96.72%

Content Moderation

Identifies specific content or characters in animations

🚀 vit-base-patch16-224-Futurama_Image_multilabel_clf

This model is a fine - tuned version of google/vit-base-patch16-224, designed for image classification tasks.

🚀 Quick Start

This model is a fine - tuned version of google/vit-base-patch16-224.

It achieves the following results on the evaluation set:

Loss: 0.0592
F1: 0.9818
Roc Auc: 0.9842
Accuracy: 0.9672

✨ Features

Model description

This is a multilabel classification model of screenshot images from the show Futurama.

For more information on how it was created, check out the following link: https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/blob/main/Computer%20Vision/Image%20Classification/Multilabel%20Classification/Futurama%20Screenshots/Futurama%20-%20ML%20Image%20CLF.ipynb

Intended uses & limitations

This model is intended to demonstrate my ability to solve a complex problem using technology.

📚 Documentation

Training and evaluation data

Dataset Source: https://www.kaggle.com/datasets/gonzalorecioc/futurama-frames-with-characteronscreen-data

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e - 05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
num_epochs: 8

Training results

Training Loss	Epoch	Step	Validation Loss	F1	Roc Auc	Accuracy
0.2456	1.0	916	0.0723	0.9711	0.9746	0.9481
0.0269	2.0	1832	0.0545	0.9799	0.9818	0.9640
0.0086	3.0	2748	0.0580	0.9794	0.9814	0.9623
0.0044	4.0	3664	0.0612	0.9814	0.9832	0.9651
0.0027	5.0	4580	0.0592	0.9818	0.9842	0.9672
0.0017	6.0	5496	0.0634	0.9800	0.9832	0.9645
0.0012	7.0	6412	0.0657	0.9817	0.9840	0.9667
0.0005	8.0	7328	0.0668	0.9812	0.9836	0.9667

Framework versions

Transformers 4.26.1
Pytorch 1.12.1
Datasets 2.8.0
Tokenizers 0.12.1

📄 License

This model is licensed under Apache - 2.0.

📦 Information Table

Property	Details
Model Type	vit - base - patch16 - 224 - Futurama_Image_multilabel_clf
Training Data	imagefolder
Metrics	f1, accuracy, roc_auc

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご