V

Videomae Large

Developed by MCG-NJU
VideoMAE is a video self-supervised pre-training model based on Masked Autoencoder (MAE), which learns video representations by predicting pixel values of masked video patches
Downloads 3,243
Release Time : 8/2/2022

Model Overview

This model adopts a Vision Transformer architecture and is pre-trained on the Kinetics-400 dataset in a self-supervised manner, suitable for feature extraction in video understanding tasks

Model Features

Video Self-supervised Learning
Uses a masked autoencoder framework to learn video representations without manual annotations
Efficient Data Utilization
Significantly reduces reliance on labeled data compared to fully supervised methods
Transformer Architecture
Encoder-decoder structure based on Vision Transformer, suitable for processing video sequence data

Model Capabilities

Video Feature Extraction
Masked Pixel Prediction
Video Representation Learning

Use Cases

Video Understanding
Video Classification
Fine-tune the pre-trained model for video classification tasks
Action Recognition
Extract video features for human action recognition
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase