V

Vit Base Patch16 224 Futurama Image Multilabel Clf

Developed by DunnBC22
A multi-label image classification model fine-tuned based on Google Vision Transformer, specifically designed to identify content in screenshots from the animated series 'Futurama'.
Downloads 19
Release Time : 2/16/2023

Model Overview

This model is a fine-tuned version of google/vit-base-patch16-224, used for multi-label classification of 'Futurama' animated screenshots. It performs excellently on the evaluation set, achieving an F1 score of 0.9818.

Model Features

High-precision Multi-label Classification
Achieved an F1 score of 0.9818 and an accuracy of 0.9672 on the 'Futurama' screenshot dataset.
Based on ViT Architecture
Utilizes the Vision Transformer base architecture, featuring powerful image feature extraction capabilities.
Fine-tuned
Underwent 8 rounds of fine-tuning, reducing training loss from 0.2456 to 0.0005.

Model Capabilities

Image Classification
Multi-label Recognition
Animated Scene Analysis

Use Cases

Media Content Analysis
Animated Scene Classification
Automatically identifies scene content in 'Futurama' animations
Accuracy rate of 96.72%
Content Moderation
Identifies specific content or characters in animations
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase