Metaclip L14 400m
MetaCLIP is a vision-language model trained on CommonCrawl data for constructing shared image-text embedding spaces.
Downloads 325
Release Time : 10/9/2023
Model Overview
This model builds a shared embedding space for images and texts by analyzing CLIP training data filtering methods, supporting various cross-modal tasks.
Model Features
Large-scale data training
Trained on 400 million data points from CommonCrawl
Cross-modal understanding
Constructs shared embedding spaces for images and texts
Zero-shot capability
Supports zero-shot classification without task-specific training
Model Capabilities
Image classification
Text-to-image retrieval
Image-to-text retrieval
Cross-modal understanding
Use Cases
Content retrieval
Text-based image search
Retrieve relevant images using natural language descriptions
Content classification
Zero-shot image classification
Classify images of new categories without training
Featured Recommended AI Models