# Video-text contrastive learning
Xclip Large Patch14 Kinetics 600
MIT
X-CLIP is an extended version of CLIP for general video-language understanding, trained on video-text pairs through contrastive learning.
Text-to-Video
Transformers English

X
microsoft
124
5
Xclip Base Patch16 Hmdb 4 Shot
MIT
X-CLIP is a minimalist extension of CLIP for general video-language understanding, trained via contrastive learning with (video, text) pairs.
Video-to-Text
Transformers English

X
microsoft
22
1
Xclip Base Patch16
MIT
X-CLIP is an extended version of CLIP for general video-language understanding, trained via contrastive learning on (video, text) pairs, suitable for tasks like video classification and video-text retrieval.
Text-to-Video
Transformers English

X
microsoft
1,647
4
Featured Recommended AI Models