Xclip Base Patch16 Ucf 16 Shot
X-CLIP is an extended version of CLIP for general video-language understanding, supporting zero-shot, few-shot, or fully supervised video classification tasks.
Downloads 92
Release Time : 9/7/2022
Model Overview
The X-CLIP model was trained in a few-shot manner (K=16) on the UCF101 dataset, primarily for video classification and video-text retrieval tasks.
Model Features
Few-shot Learning
This model was trained using only 16 samples, demonstrating strong few-shot learning capabilities.
Video-Text Contrastive Learning
Trained in a contrastive manner on (video, text) pairs, supporting video-text matching tasks.
High Accuracy
Achieves a top-1 accuracy of 91.4% on the UCF101 dataset, demonstrating excellent performance.
Model Capabilities
Video Classification
Video-Text Retrieval
Few-shot Learning
Use Cases
Video Understanding
Video Classification
Classify video content, suitable for scenarios such as video content management and recommendation systems.
Achieves a top-1 accuracy of 91.4% on the UCF101 dataset.
Video-Text Retrieval
Retrieve relevant videos based on text descriptions, suitable for video search and content moderation scenarios.
Featured Recommended AI Models
Š 2025AIbase