Xclip Base Patch16 Ucf 8 Shot
X-CLIP is a minimalist extension of CLIP for general video-language understanding, trained contrastively on (video, text) pairs, suitable for zero-shot, few-shot, or fully supervised video classification as well as video-text retrieval tasks.
Downloads 16
Release Time : 9/7/2022
Model Overview
The X-CLIP model (base size, 16x16 patch resolution) is trained in a few-shot manner (K=8) on UCF101, suitable for video classification tasks.
Model Features
Few-shot Learning
This model is trained in a few-shot manner (K=8) on the UCF101 dataset, suitable for applications with limited data availability.
Video-Text Contrastive Learning
The model is trained contrastively on (video, text) pairs, supporting video-text retrieval tasks.
High Accuracy
On the UCF101 dataset, the model achieves a top-1 accuracy of 88.3%.
Model Capabilities
Video Classification
Video-Text Retrieval
Few-shot Learning
Use Cases
Video Understanding
Video Classification
Classify video content, suitable for video content analysis and management.
Achieves a top-1 accuracy of 88.3% on the UCF101 dataset.
Video-Text Retrieval
Retrieve relevant video content based on text descriptions, suitable for video search and recommendation systems.
Featured Recommended AI Models
Š 2025AIbase