xclip-base-patch32 Open Source Video Language Understanding Model - Supports Video Classification and Text Retrieval Tasks

Xclip Base Patch32

Developed by microsoft

X-CLIP is an extended version of CLIP for general video-language understanding, trained on (video, text) pairs via contrastive learning, suitable for tasks like video classification and video-text retrieval.

Text-to-Video

Transformers

EnglishOpen Source License:MIT #Video-Text Contrastive Learning #Zero-Shot Video Classification #Multimodal Video Understanding

Downloads 309.80k

Release Time : 8/25/2022

Model Overview

The X-CLIP model (base size, 32x32 patch resolution) was fully supervised trained on the Kinetics-400 dataset and can be used for zero-shot, few-shot, or fully supervised video classification and video-text retrieval tasks.

Model Features

Video-Language Understanding

Extends the capabilities of the CLIP model to handle contrastive learning tasks involving videos and text.

Multi-Task Support

Supports various tasks such as zero-shot, few-shot, or fully supervised video classification and video-text retrieval.

Efficient Training

Uses 8 frames per video at 224x224 resolution during training to ensure efficiency.

Model Capabilities

Video Classification

Video-Text Retrieval

Zero-Shot Learning

Few-Shot Learning

Use Cases

Video Understanding

Video Classification

Classify video content to identify actions or scenes in videos.

Achieves 80.4% top-1 accuracy and 95.0% top-5 accuracy on the Kinetics-400 dataset.

Video-Text Retrieval

Retrieve relevant videos based on text descriptions or generate descriptive text from video content.

Property	Details
Model Type	X-CLIP (base-sized model)
Training Data	Kinetics-400

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Xclip Base Patch32

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 X-CLIP (base-sized model)

🚀 Quick Start

✨ Features

📚 Documentation

Intended uses & limitations

How to use

Training data

Preprocessing

Evaluation results

📄 License