X

Xclip Base Patch16 Ucf 4 Shot

Developed by microsoft
X-CLIP is a minimal extension of CLIP for general video-language understanding, trained via contrastive learning with (video, text) pairs.
Downloads 16
Release Time : 9/7/2022

Model Overview

The X-CLIP model (base-scale, 16x16 patch resolution) is trained on UCF101 in a few-shot manner (K=4), suitable for zero-shot, few-shot, or fully supervised video classification and video-text retrieval tasks.

Model Features

Few-shot Learning
The model is trained on the UCF101 dataset in a few-shot manner (K=4), suitable for scenarios with limited data.
Video-Text Contrastive Learning
Trained via contrastive learning with (video, text) pairs, supporting video-text matching tasks.
General Video Recognition
The model can be used for zero-shot, few-shot, or fully supervised video classification and video-text retrieval tasks.

Model Capabilities

Video Classification
Video-Text Retrieval
Zero-shot Learning
Few-shot Learning

Use Cases

Video Understanding
Video Classification
Classify video content, applicable to the 101 action categories in the UCF101 dataset.
Top-1 accuracy reaches 83.4%
Video-Text Retrieval
Retrieve relevant videos based on text descriptions or generate matching text descriptions based on video content.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase