X

Xclip Base Patch16 Hmdb 2 Shot

Developed by microsoft
X-CLIP is an extended version of CLIP for general video-language understanding, trained via contrastive learning on video-text pairs, supporting zero-shot, few-shot, and fully supervised video classification tasks.
Downloads 19
Release Time : 9/7/2022

Model Overview

The X-CLIP model (base size, 16x16 patch resolution) is trained in a few-shot manner (K=2) on HMDB-51, suitable for tasks like video classification and video-text retrieval.

Model Features

Few-shot Learning Capability
This model was trained with only 2 samples on the HMDB-51 dataset, demonstrating strong few-shot learning capability.
Video-Text Contrastive Learning
Trained via contrastive learning, it can understand the relationship between video content and text descriptions.
Multi-task Support
Supports zero-shot, few-shot, and fully supervised video classification tasks, as well as applications like video-text retrieval.

Model Capabilities

Video Classification
Video-Text Retrieval
Few-shot Learning
Zero-shot Inference

Use Cases

Video Understanding
Action Recognition
Recognize human actions in videos
Achieved 53.0% top-1 accuracy on the HMDB-51 dataset
Video Content Retrieval
Retrieve relevant video clips based on text descriptions
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase