X

Xclip Base Patch16 Ucf 16 Shot

Developed by microsoft
X-CLIP is an extended version of CLIP for general video-language understanding, supporting zero-shot, few-shot, or fully supervised video classification tasks.
Downloads 92
Release Time : 9/7/2022

Model Overview

The X-CLIP model was trained in a few-shot manner (K=16) on the UCF101 dataset, primarily for video classification and video-text retrieval tasks.

Model Features

Few-shot Learning
This model was trained using only 16 samples, demonstrating strong few-shot learning capabilities.
Video-Text Contrastive Learning
Trained in a contrastive manner on (video, text) pairs, supporting video-text matching tasks.
High Accuracy
Achieves a top-1 accuracy of 91.4% on the UCF101 dataset, demonstrating excellent performance.

Model Capabilities

Video Classification
Video-Text Retrieval
Few-shot Learning

Use Cases

Video Understanding
Video Classification
Classify video content, suitable for scenarios such as video content management and recommendation systems.
Achieves a top-1 accuracy of 91.4% on the UCF101 dataset.
Video-Text Retrieval
Retrieve relevant videos based on text descriptions, suitable for video search and content moderation scenarios.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase