V

Videomae Base Finetuned Kinetics

Developed by MCG-NJU
VideoMAE is a video self-supervised pre-training model based on Masked Autoencoder (MAE), fine-tuned on the Kinetics-400 dataset for video classification tasks.
Downloads 44.91k
Release Time : 7/8/2022

Model Overview

This model is pre-trained in a self-supervised manner and fine-tuned with supervision on the Kinetics-400 dataset, capable of classifying videos into one of 400 possible categories.

Model Features

Self-supervised Pre-training
Uses Masked Autoencoder (MAE) method for self-supervised pre-training to learn internal video representations
Efficient Video Representation
By predicting pixel values of masked video patches, the model learns effective video feature representations
Transformer Architecture
Based on Vision Transformer architecture, processes sequences of video patches, suitable for temporal video modeling

Model Capabilities

Video Classification
Video Feature Extraction

Use Cases

Video Understanding
Kinetics-400 Video Classification
Classify videos into 400 categories from the Kinetics-400 dataset
Achieves 80.9 top-1 accuracy and 94.7 top-5 accuracy on Kinetics-400 test set
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase