X

Xclip Base Patch16 Hmdb 4 Shot

Developed by microsoft
X-CLIP is a minimalist extension of CLIP for general video-language understanding, trained via contrastive learning with (video, text) pairs.
Downloads 22
Release Time : 9/7/2022

Model Overview

This is a base-size X-CLIP model with 16-pixel patch resolution, trained in a few-shot manner (K=4) on the HMDB-51 dataset, suitable for video classification tasks.

Model Features

Few-shot learning capability
The model demonstrates good few-shot learning ability by being trained with only 4 samples on the HMDB-51 dataset.
Video-text contrastive learning
Uses contrastive learning with (video, text) pairs to enhance the model's understanding of video content.
Efficient video processing
Processes 32 frames per video at 224x224 resolution, balancing computational efficiency and model performance.

Model Capabilities

Video classification
Video-text matching
Few-shot learning

Use Cases

Video understanding
Human action recognition
Recognizing human action categories in videos
Achieves 57.3% top-1 accuracy on HMDB-51 dataset
Video retrieval
Text-based video retrieval
Retrieving relevant video clips based on text descriptions
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase