X-CLIP Open-Source Model - A General Video-Language Understanding Tool Supporting Video Classification and Text Retrieval

Xclip Base Patch16 Hmdb 16 Shot

Developed by microsoft

X-CLIP is an extended version of CLIP for general video-language understanding, supporting video classification and video-text retrieval tasks.

Downloads 49

Release Time : 9/7/2022

Model Overview

The X-CLIP model (base size, 16x16 patch resolution) is trained on HMDB-51 in a few-shot manner (K=16), suitable for video classification tasks.

Few-shot Learning

The model is trained on the HMDB-51 dataset in a few-shot manner (K=16), making it suitable for scenarios with scarce data.

Video-Text Contrastive Learning

Trained using contrastive learning, enabling the understanding of relationships between videos and text.

High-Resolution Processing

Uses 32 frames per video during training with a resolution of 224x224, suitable for high-resolution video analysis.

Video Classification

Video-Text Retrieval

Few-shot Learning

Video Understanding

Action Recognition

Recognizes human actions in videos, such as running, jumping, etc.

Achieves a top-1 accuracy of 64.0% on the HMDB-51 dataset.

Property	Details
Model Type	X-CLIP (base-sized, patch resolution of 16)
Training Data	HMDB - 51

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base