X

Xclip Large Patch14 Kinetics 600

Developed by microsoft
X-CLIP is an extended version of CLIP for general video-language understanding, trained on video-text pairs through contrastive learning.
Downloads 124
Release Time : 9/8/2022

Model Overview

The X-CLIP model (large size, 14-patch resolution) was fully supervised trained on Kinetics-600, suitable for tasks such as video classification and video-text retrieval.

Model Features

Video-Language Understanding
Trained on video-text pairs through contrastive learning, supporting video classification and video-text retrieval.
High Accuracy
Achieves a top-1 accuracy of 88.3% and a top-5 accuracy of 97.7% on the Kinetics-400 dataset.
Multi-task Support
Can be used for zero-shot, few-shot, or fully supervised video classification as well as video-text retrieval tasks.

Model Capabilities

Video classification
Video-text retrieval
Zero-shot learning
Few-shot learning

Use Cases

Video Analysis
Video Classification
Classify video content to recognize actions or scenes in videos.
Achieves 88.3% top-1 accuracy on the Kinetics-400 dataset.
Video-Text Retrieval
Retrieve relevant videos based on text descriptions or generate descriptive text based on video content.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase