Metaclip B32 400m
The MetaCLIP base model is a vision-language model trained on CommonCrawl data for constructing shared image-text embedding spaces.
Downloads 135.37k
Release Time : 10/7/2023
Model Overview
This model applies MetaCLIP technology to 400 million data points, supporting tasks like zero-shot image classification and text-based image retrieval.
Model Features
Large-scale Data Training
Trained on 400 million data points from CommonCrawl, with strong generalization capabilities
Zero-shot Learning Capability
Capable of performing various vision tasks without task-specific fine-tuning
Shared Embedding Space
Constructs a unified representation space for images and text, supporting cross-modal retrieval
Model Capabilities
Zero-shot Image Classification
Text-based Image Retrieval
Image-based Text Retrieval
Cross-modal Representation Learning
Use Cases
Content Retrieval
Image Search Engine
Retrieve relevant images using natural language descriptions
Content Classification
Zero-shot Image Classification
Classify images of new categories without training
Featured Recommended AI Models