Xclip Base Patch16 Hmdb 4 Shot
X-CLIP is a minimalist extension of CLIP for general video-language understanding, trained via contrastive learning with (video, text) pairs.
Downloads 22
Release Time : 9/7/2022
Model Overview
This is a base-size X-CLIP model with 16-pixel patch resolution, trained in a few-shot manner (K=4) on the HMDB-51 dataset, suitable for video classification tasks.
Model Features
Few-shot learning capability
The model demonstrates good few-shot learning ability by being trained with only 4 samples on the HMDB-51 dataset.
Video-text contrastive learning
Uses contrastive learning with (video, text) pairs to enhance the model's understanding of video content.
Efficient video processing
Processes 32 frames per video at 224x224 resolution, balancing computational efficiency and model performance.
Model Capabilities
Video classification
Video-text matching
Few-shot learning
Use Cases
Video understanding
Human action recognition
Recognizing human action categories in videos
Achieves 57.3% top-1 accuracy on HMDB-51 dataset
Video retrieval
Text-based video retrieval
Retrieving relevant video clips based on text descriptions
Featured Recommended AI Models
Š 2025AIbase