X

Xclip Base Patch16 Zero Shot

Developed by aurelio-ai
X-CLIP is a minimalist extension of CLIP for general video-language understanding, trained via contrastive learning to match videos and texts.
Downloads 22
Release Time : 11/8/2023

Model Overview

The X-CLIP model (base size, 16x16 patch resolution) is trained on Kinetics-400 and suitable for zero-shot, few-shot, or fully supervised video classification, as well as video-text retrieval tasks.

Model Features

Zero-Shot Video Classification
Directly applicable to video classification tasks without fine-tuning, supporting zero-shot learning.
Video-Text Matching
Capable of assessing the match between text descriptions and given video content.
Multi-Task Support
Supports various tasks including video classification and video-text retrieval.

Model Capabilities

Video Classification
Video-Text Retrieval
Zero-Shot Learning

Use Cases

Video Understanding
Video Classification
Classify video content, such as action recognition and scene recognition.
Zero-shot top-1 accuracy: 44.6% on HMDB-51, 72.0% on UCF-101, and 65.2% on Kinetics-600.
Video-Text Retrieval
Retrieve relevant video content based on text descriptions.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase