# Action Recognition

Test With Sdfvd
A video understanding model fine-tuned based on MCG-NJU/videomae-base, with average performance on the evaluation set (accuracy 50%)
Video Processing Transformers
T
cocovani
16
0
Datatrain Videomae Base Finetuned Lr1e 07 Poly3
A video understanding model fine-tuned from MCG-NJU/videomae-base, trained on an unknown dataset with an accuracy of 11.1%
Video Processing Transformers
D
EloiseInacio
13
0
Videomae Base Finetuned 1e 08 Bs4 Ep2
A video understanding model fine-tuned based on MCG-NJU/videomae-base, trained on an unknown dataset
Video Processing Transformers
V
EloiseInacio
14
0
Model Timesformer Subset 02
A video understanding model based on the TimeSformer architecture, fine-tuned on an unknown dataset with an accuracy of 88.52%
Video Processing Transformers
M
namnh2002
15
0
Sign Language Classification V1
Apache-2.0
A sign language classification model fine-tuned based on Google Vision Transformer (ViT), achieving an accuracy of 80.56%
Image Classification Transformers
S
joseluhf11
40
2
Videomae Base Ipm All Videos
A vision model fine-tuned from the VideoMAE base model on an unknown video dataset, primarily used for video understanding tasks, achieving 85.59% accuracy on the evaluation set.
Video Processing Transformers
V
rickysk
30
0
Videomae Huge Finetuned Kinetics
VideoMAE is a video pretraining model based on Masked Autoencoder (MAE), fine-tuned on the Kinetics-400 dataset through self-supervised learning, suitable for video classification tasks.
Video Processing Transformers
V
MCG-NJU
2,984
4
Videomae Base Finetuned
A video understanding model fine-tuned on an unknown dataset based on MCG-NJU/videomae-base, achieving an F1 score of 0.7147
Video Processing Transformers
V
sheraz179
15
0
Videomae Base Finetuned
A video understanding model fine-tuned on an unknown dataset based on the VideoMAE base model, achieving 86.41% accuracy on the evaluation set
Video Processing Transformers
V
LouisDT
15
0
Timesformer Hr Finetuned K400
TimeSformer is a video understanding model based on spatio-temporal attention mechanisms, pre-trained and fine-tuned on the Kinetics-400 dataset.
Video Processing Transformers
T
facebook
178
2
Xclip Base Patch16 Hmdb 16 Shot
MIT
X-CLIP is an extended version of CLIP for general video-language understanding, supporting video classification and video-text retrieval tasks.
Video Processing Transformers English
X
microsoft
49
0
Xclip Base Patch16 Hmdb 8 Shot
MIT
X-CLIP is an extended version of CLIP for general video-language understanding, trained through contrastive learning on video-text pairs, suitable for video classification and video-text retrieval tasks.
Text-to-Video Transformers English
X
microsoft
17
1
Xclip Base Patch16 Hmdb 2 Shot
MIT
X-CLIP is an extended version of CLIP for general video-language understanding, trained via contrastive learning on video-text pairs, supporting zero-shot, few-shot, and fully supervised video classification tasks.
Text-to-Video Transformers English
X
microsoft
19
0
Videomae Base Finetuned Ssv2
VideoMAE is a video self-supervised pretraining model based on Masked Autoencoder (MAE), fine-tuned on the Something-Something-v2 dataset for video classification tasks.
Video Processing Transformers
V
MCG-NJU
951
6
Videomae Base Finetuned Kinetics
VideoMAE is a video self-supervised pre-training model based on Masked Autoencoder (MAE), fine-tuned on the Kinetics-400 dataset for video classification tasks.
Video Processing Transformers
V
MCG-NJU
44.91k
34
Video Classification Cnn Rnn
A hybrid CNN-RNN architecture-based video classification model for action recognition tasks
Video Processing
V
keras-io
57
14
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase