X-CLIP Open-source Video Language Understanding Model - Supports Free Deployment of Multi-modal Video Classification Tasks

Xclip Base Patch16 Ucf 16 Shot

Developed by microsoft

X-CLIP is an extended version of CLIP for general video-language understanding, supporting zero-shot, few-shot, or fully supervised video classification tasks.

Video Processing

Transformers

EnglishOpen Source License:MIT #Video Classification #Few-shot Learning #High Accuracy

Downloads 92

Release Time : 9/7/2022

Model Overview

The X-CLIP model was trained in a few-shot manner (K=16) on the UCF101 dataset, primarily for video classification and video-text retrieval tasks.

Model Features

Few-shot Learning

This model was trained using only 16 samples, demonstrating strong few-shot learning capabilities.

Video-Text Contrastive Learning

Trained in a contrastive manner on (video, text) pairs, supporting video-text matching tasks.

High Accuracy

Achieves a top-1 accuracy of 91.4% on the UCF101 dataset, demonstrating excellent performance.

Model Capabilities

Video Classification

Video-Text Retrieval

Few-shot Learning

Use Cases

Video Understanding

Video Classification

Classify video content, suitable for scenarios such as video content management and recommendation systems.

Achieves a top-1 accuracy of 91.4% on the UCF101 dataset.

Video-Text Retrieval

Retrieve relevant videos based on text descriptions, suitable for video search and content moderation scenarios.

Property	Details
Model Type	X-CLIP (base-sized model)
Training Data	UCF101

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Xclip Base Patch16 Ucf 16 Shot

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 X-CLIP (base-sized model)

🚀 Quick Start

✨ Features

📚 Documentation

Intended uses & limitations

How to use

Training data

Preprocessing

Evaluation results

📄 License