V

Videomae Base Short

Developed by MCG-NJU
VideoMAE is a video self-supervised pretraining model based on Masked Autoencoder (MAE), which learns internal video representations through masked patch prediction, suitable for downstream tasks like video classification.
Downloads 886
Release Time : 7/7/2022

Model Overview

This model extends the Masked Autoencoder framework to the video domain, employing a standard Vision Transformer architecture with an added decoder for predicting pixel values of masked patches. Primarily used for video feature extraction and downstream task fine-tuning.

Model Features

Video Self-supervised Learning
Adopts the Masked Autoencoder framework for self-supervised pretraining by predicting masked video patches.
Data Efficiency
Learns effective video representations with fewer labeled data compared to fully supervised methods.
Transformer Architecture
Based on standard Vision Transformer architecture, offering excellent scalability and transfer capability.

Model Capabilities

Video Feature Extraction
Video Representation Learning
Masked Patch Prediction

Use Cases

Video Understanding
Video Classification
Fine-tune the pretrained model for video classification tasks.
Action Recognition
Can be used for human action recognition tasks in videos.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase