V

Videomae Base Finetuned Ssv2

Developed by MCG-NJU
VideoMAE is a video self-supervised pretraining model based on Masked Autoencoder (MAE), fine-tuned on the Something-Something-v2 dataset for video classification tasks.
Downloads 951
Release Time : 8/2/2022

Model Overview

This model is pretrained in a self-supervised manner and fine-tuned in a supervised way on the Something-Something-v2 dataset, primarily for video classification tasks.

Model Features

Self-Supervised Pretraining
Uses Masked Autoencoder (MAE) method for video self-supervised pretraining, reducing reliance on labeled data
Efficient Video Representation Learning
Learns internal video representations through masking and reconstruction mechanisms, effectively extracting video features
Transformer Architecture
Based on Vision Transformer architecture, processing videos as fixed-size patch sequences

Model Capabilities

Video Classification
Video Feature Extraction

Use Cases

Video Understanding
Action Recognition
Recognizing human actions and behaviors in videos
Achieves 70.6% top-1 accuracy on Something-Something-v2 test set
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase