Xclip Base Patch16 Hmdb 2 Shot
X-CLIP is an extended version of CLIP for general video-language understanding, trained via contrastive learning on video-text pairs, supporting zero-shot, few-shot, and fully supervised video classification tasks.
Downloads 19
Release Time : 9/7/2022
Model Overview
The X-CLIP model (base size, 16x16 patch resolution) is trained in a few-shot manner (K=2) on HMDB-51, suitable for tasks like video classification and video-text retrieval.
Model Features
Few-shot Learning Capability
This model was trained with only 2 samples on the HMDB-51 dataset, demonstrating strong few-shot learning capability.
Video-Text Contrastive Learning
Trained via contrastive learning, it can understand the relationship between video content and text descriptions.
Multi-task Support
Supports zero-shot, few-shot, and fully supervised video classification tasks, as well as applications like video-text retrieval.
Model Capabilities
Video Classification
Video-Text Retrieval
Few-shot Learning
Zero-shot Inference
Use Cases
Video Understanding
Action Recognition
Recognize human actions in videos
Achieved 53.0% top-1 accuracy on the HMDB-51 dataset
Video Content Retrieval
Retrieve relevant video clips based on text descriptions
Featured Recommended AI Models
Š 2025AIbase