Model Selection

Action Recognition

# Action Recognition

Test With Sdfvd

A video understanding model fine-tuned based on MCG-NJU/videomae-base, with average performance on the evaluation set (accuracy 50%)

Video Processing

Datatrain Videomae Base Finetuned Lr1e 07 Poly3

A video understanding model fine-tuned from MCG-NJU/videomae-base, trained on an unknown dataset with an accuracy of 11.1%

Video Processing

Videomae Base Finetuned 1e 08 Bs4 Ep2

A video understanding model fine-tuned based on MCG-NJU/videomae-base, trained on an unknown dataset

Video Processing

Model Timesformer Subset 02

A video understanding model based on the TimeSformer architecture, fine-tuned on an unknown dataset with an accuracy of 88.52%

Video Processing

Sign Language Classification V1

A sign language classification model fine-tuned based on Google Vision Transformer (ViT), achieving an accuracy of 80.56%

Image Classification

Videomae Base Ipm All Videos

A vision model fine-tuned from the VideoMAE base model on an unknown video dataset, primarily used for video understanding tasks, achieving 85.59% accuracy on the evaluation set.

Video Processing

Videomae Huge Finetuned Kinetics

VideoMAE is a video pretraining model based on Masked Autoencoder (MAE), fine-tuned on the Kinetics-400 dataset through self-supervised learning, suitable for video classification tasks.

Video Processing

Videomae Base Finetuned

A video understanding model fine-tuned on an unknown dataset based on MCG-NJU/videomae-base, achieving an F1 score of 0.7147

Video Processing

Videomae Base Finetuned

A video understanding model fine-tuned on an unknown dataset based on the VideoMAE base model, achieving 86.41% accuracy on the evaluation set

Video Processing

Timesformer Hr Finetuned K400

TimeSformer is a video understanding model based on spatio-temporal attention mechanisms, pre-trained and fine-tuned on the Kinetics-400 dataset.

Video Processing

Xclip Base Patch16 Hmdb 16 Shot

X-CLIP is an extended version of CLIP for general video-language understanding, supporting video classification and video-text retrieval tasks.

Video Processing

Transformers English

Xclip Base Patch16 Hmdb 8 Shot

X-CLIP is an extended version of CLIP for general video-language understanding, trained through contrastive learning on video-text pairs, suitable for video classification and video-text retrieval tasks.

Transformers English

Xclip Base Patch16 Hmdb 2 Shot

X-CLIP is an extended version of CLIP for general video-language understanding, trained via contrastive learning on video-text pairs, supporting zero-shot, few-shot, and fully supervised video classification tasks.

Transformers English

Videomae Base Finetuned Ssv2

VideoMAE is a video self-supervised pretraining model based on Masked Autoencoder (MAE), fine-tuned on the Something-Something-v2 dataset for video classification tasks.

Video Processing

Videomae Base Finetuned Kinetics

VideoMAE is a video self-supervised pre-training model based on Masked Autoencoder (MAE), fine-tuned on the Kinetics-400 dataset for video classification tasks.

Video Processing

Video Classification Cnn Rnn

A hybrid CNN-RNN architecture-based video classification model for action recognition tasks

Video Processing

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase