V

Videomae Huge Finetuned Kinetics

Developed by MCG-NJU
VideoMAE is a video pretraining model based on Masked Autoencoder (MAE), fine-tuned on the Kinetics-400 dataset through self-supervised learning, suitable for video classification tasks.
Downloads 2,984
Release Time : 4/16/2023

Model Overview

This model extends the masked autoencoder to the video domain, adopting a standard Vision Transformer architecture with a decoder added on top to predict pixel values of masked patches. It learns internal video representations through pretraining and can be used for downstream video classification tasks.

Model Features

Self-supervised Pretraining
Uses 1600 epochs of self-supervised pretraining to effectively learn internal video representations
Efficient Video Learning
Based on the masked autoencoder framework, achieves efficient video feature learning by predicting pixel values of masked patches
Large-scale Fine-tuning
Supervised fine-tuning on the Kinetics-400 dataset, suitable for 400-class video classification tasks

Model Capabilities

Video feature extraction
Video classification
Self-supervised learning

Use Cases

Video Content Analysis
Action Recognition
Identify human actions and behaviors in videos
Achieves 86.6% top-1 accuracy on the Kinetics-400 test set
Video Classification
Classify and label video content
Supports classification with 400 Kinetics-400 labels
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase