# Masked Autoencoder

Vit Base Patch16 1024 128.audiomae As2m Ft As20k
A Vision Transformer (ViT)-based audio processing model, pre-trained on AudioSet-2M using self-supervised masked autoencoder (MAE) method and fine-tuned on AudioSet-20k
Audio Classification
V
gaunernst
335
2
Videomae Base
VideoMAE is a video self-supervised pretraining model based on Masked Autoencoder (MAE), which learns internal video representations by predicting pixel values of masked video patches.
Video Processing Transformers
V
MCG-NJU
48.66k
45
Videomae Base Finetuned Ssv2
VideoMAE is a video self-supervised pretraining model based on Masked Autoencoder (MAE), fine-tuned on the Something-Something-v2 dataset for video classification tasks.
Video Processing Transformers
V
MCG-NJU
951
6
Videomae Base Ssv2
VideoMAE is a self-supervised video pre-training model based on masked autoencoder, pre-trained for 2400 epochs on the Something-Something-v2 dataset.
Video Processing Transformers
V
MCG-NJU
454
2
Videomae Large Finetuned Kinetics
VideoMAE is a self-supervised video pre-training model based on masked autoencoder, fine-tuned on the Kinetics-400 dataset for video classification tasks.
Video Processing Transformers
V
MCG-NJU
4,657
12
Videomae Large
VideoMAE is a video self-supervised pre-training model based on Masked Autoencoder (MAE), which learns video representations by predicting pixel values of masked video patches
Video Processing Transformers
V
MCG-NJU
3,243
31
Videomae Base Short Finetuned Kinetics
VideoMAE is a video self-supervised pre-training model based on Masked Autoencoder (MAE), fine-tuned on the Kinetics-400 dataset for video classification tasks.
Video Processing Transformers
V
MCG-NJU
62
3
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase