V

Videomae Base

Developed by MCG-NJU
VideoMAE is a video self-supervised pretraining model based on Masked Autoencoder (MAE), which learns internal video representations by predicting pixel values of masked video patches.
Downloads 48.66k
Release Time : 8/3/2022

Model Overview

This model extends the Masked Autoencoder to the video domain, employing a Vision Transformer architecture with an added decoder for predicting pixel values of masked patches. Primarily used for video feature extraction and downstream task fine-tuning.

Model Features

Video Self-supervised Learning
Achieves unsupervised pretraining through masked video patch prediction tasks, reducing reliance on labeled data
Efficient Data Utilization
Learns effective video representations with less data compared to traditional methods
Flexible Downstream Applications
Pretrained model can be fine-tuned for various video understanding tasks

Model Capabilities

Video Feature Extraction
Masked Patch Pixel Prediction
Video Representation Learning

Use Cases

Video Understanding
Video Classification
Add classification layers on top of the pretrained model for fine-tuning
Action Recognition
Recognize specific actions using learned video representations
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase