Open-source the MetaCLIP-B32-FullCC2.5B vision-language model and build an image-text shared embedding space

Metaclip B32 Fullcc2.5b

Developed by facebook

MetaCLIP is a vision-language model trained on 2.5 billion data points from CommonCrawl (CC) to construct a shared image-text embedding space.

Text-to-Image

Transformers

#Zero-shot image classification #Cross-modal retrieval #Large-scale pre-training

Downloads 413

Release Time : 10/7/2023

Model Overview

Developed by Meta's team, this model aims to reveal CLIP's training data filtering methods and supports tasks like zero-shot image classification and text-based image retrieval.

Model Features

Large-scale training data

Trained on 2.5 billion data points from CommonCrawl, covering a wide range of visual concepts

Open data process

First public disclosure of data filtering methods for CLIP-like models, improving transparency

Multimodal embedding space

Constructs a unified image-text embedding space supporting cross-modal retrieval

Model Capabilities

Zero-shot image classification

Text-based image retrieval

Image-based text retrieval

Cross-modal feature extraction

Use Cases

Content retrieval

Image search engine

Retrieve relevant images using natural language descriptions

Intelligent classification

Zero-shot image classification

Classify images of new categories without specific training

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Metaclip B32 Fullcc2.5b

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 MetaCLIP Model (Base-sized, Patch Resolution 32)

🚀 Quick Start

✨ Features

Model description

Intended uses & limitations

How to use

BibTeX entry and citation info

📄 License