Vit Base Patch16 224 Futurama Image Multilabel Clf
A multi-label image classification model fine-tuned based on Google Vision Transformer, specifically designed to identify content in screenshots from the animated series 'Futurama'.
Downloads 19
Release Time : 2/16/2023
Model Overview
This model is a fine-tuned version of google/vit-base-patch16-224, used for multi-label classification of 'Futurama' animated screenshots. It performs excellently on the evaluation set, achieving an F1 score of 0.9818.
Model Features
High-precision Multi-label Classification
Achieved an F1 score of 0.9818 and an accuracy of 0.9672 on the 'Futurama' screenshot dataset.
Based on ViT Architecture
Utilizes the Vision Transformer base architecture, featuring powerful image feature extraction capabilities.
Fine-tuned
Underwent 8 rounds of fine-tuning, reducing training loss from 0.2456 to 0.0005.
Model Capabilities
Image Classification
Multi-label Recognition
Animated Scene Analysis
Use Cases
Media Content Analysis
Animated Scene Classification
Automatically identifies scene content in 'Futurama' animations
Accuracy rate of 96.72%
Content Moderation
Identifies specific content or characters in animations
Featured Recommended AI Models
Š 2025AIbase