X

Xclip Base Patch16 Hmdb 8 Shot

Developed by microsoft
X-CLIP is an extended version of CLIP for general video-language understanding, trained through contrastive learning on video-text pairs, suitable for video classification and video-text retrieval tasks.
Downloads 17
Release Time : 9/7/2022

Model Overview

The base-size X-CLIP model is trained in a few-shot manner on the HMDB-51 dataset, supporting video classification and video-text retrieval tasks.

Model Features

Few-shot Learning
The model is trained in an 8-shot manner on the HMDB-51 dataset, suitable for few-shot scenarios.
Video-Language Understanding
Trained through contrastive learning on video-text pairs, supporting video-text matching tasks.
High-Resolution Processing
Uses 32 frames per video during training with a resolution of 224x224, suitable for high-resolution video analysis.

Model Capabilities

Video Classification
Video-Text Retrieval
Few-shot Learning

Use Cases

Video Analysis
Action Recognition
Identify specific actions in videos, such as running, jumping, etc.
Achieves 62.8% top-1 accuracy on the HMDB-51 dataset.
Video-Text Matching
Video Retrieval
Retrieve relevant video clips based on text descriptions.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase