Xclip Base Patch16 Hmdb 16 Shot
X-CLIP is an extended version of CLIP for general video-language understanding, supporting video classification and video-text retrieval tasks.
Downloads 49
Release Time : 9/7/2022
Model Overview
The X-CLIP model (base size, 16x16 patch resolution) is trained on HMDB-51 in a few-shot manner (K=16), suitable for video classification tasks.
Model Features
Few-shot Learning
The model is trained on the HMDB-51 dataset in a few-shot manner (K=16), making it suitable for scenarios with scarce data.
Video-Text Contrastive Learning
Trained using contrastive learning, enabling the understanding of relationships between videos and text.
High-Resolution Processing
Uses 32 frames per video during training with a resolution of 224x224, suitable for high-resolution video analysis.
Model Capabilities
Video Classification
Video-Text Retrieval
Few-shot Learning
Use Cases
Video Understanding
Action Recognition
Recognizes human actions in videos, such as running, jumping, etc.
Achieves a top-1 accuracy of 64.0% on the HMDB-51 dataset.
Featured Recommended AI Models
Š 2025AIbase