MobileCLIP-B-LT-OpenCLIP Open-Source Image-Text Model - Developed by Apple, Superior for Fast Zero-Shot Image Classification Among Peers

Mobileclip B LT OpenCLIP

Developed by apple

MobileCLIP-B (LT) is an efficient image-text model developed by Apple, achieving fast zero-shot image classification through multimodal reinforcement training, outperforming similar models.

Text-to-Image

Safetensors

#Zero-shot image classification #Fast image-text matching #Low-latency inference

Downloads 774

Release Time : 6/7/2024

Model Overview

MobileCLIP is a fast image-text model specifically designed for zero-shot image classification tasks, delivering efficient performance through optimized architecture and training methods.

Model Features

Efficient performance

Significantly improves speed while maintaining high performance, 2-5 times faster than similar models

Compact size

Model size is 2-3 times smaller than similar ViT-B/16 models

Reinforcement training

Utilizes multimodal reinforcement training with 36B training samples

Zero-shot capability

Optimized for zero-shot image classification tasks without task-specific fine-tuning

Model Capabilities

Zero-shot image classification

Multimodal understanding

Fast inference

Use Cases

Computer vision

Image classification

Classify images without specific training

Achieves 77.2% zero-shot accuracy on ImageNet-1k

Multimodal retrieval

Enable cross-modal image-text retrieval

Mobile applications

Mobile image recognition

Lightweight image recognition suitable for deployment on mobile devices

Low latency (10.4ms for image + 3.3ms for text)

🚀 MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

MobileCLIP is a fast image - text model developed through multi - modal reinforced training, which offers high - performance zero - shot image classification capabilities.

🚀 Quick Start

MobileCLIP was introduced in MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training (CVPR 2024), by Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel.

This repository contains the MobileCLIP - B (LT) checkpoint for OpenCLIP.

MobileCLIP Performance Figure

✨ Features

Our smallest variant MobileCLIP - S0 obtains similar zero - shot performance as OpenAI's ViT - B/16 model while being 4.8x faster and 2.8x smaller.
MobileCLIP - S2 obtains better avg zero - shot performance than SigLIP's ViT - B/16 model while being 2.3x faster and 2.1x smaller, and trained with 3x less seen samples.
MobileCLIP - B(LT) attains zero - shot ImageNet performance of 77.2% which is significantly better than recent works like DFN and SigLIP with similar architectures or even OpenAI's ViT - L/14@336.

📚 Documentation

Checkpoints

Property	Details
Model Type	[MobileCLIP - S0](https://hf.co/pcuenq/MobileCLIP - S0), [MobileCLIP - S1](https://hf.co/pcuenq/MobileCLIP - S1), [MobileCLIP - S2](https://hf.co/pcuenq/MobileCLIP - S2), [MobileCLIP - B](https://hf.co/pcuenq/MobileCLIP - B), [MobileCLIP - B (LT)](https://hf.co/pcuenq/MobileCLIP - B - LT)
# Seen Samples (B)	13 (for MobileCLIP - S0, S1, S2, B); 36 (for MobileCLIP - B (LT))
# Params (M) (img + txt)	11.4 + 42.4 (MobileCLIP - S0); 21.5 + 63.4 (MobileCLIP - S1); 35.7 + 63.4 (MobileCLIP - S2); 86.3 + 63.4 (MobileCLIP - B, MobileCLIP - B (LT))
Latency (ms) (img + txt)	1.5 + 1.6 (MobileCLIP - S0); 2.5 + 3.3 (MobileCLIP - S1); 3.6 + 3.3 (MobileCLIP - S2); 10.4 + 3.3 (MobileCLIP - B, MobileCLIP - B (LT))
IN - 1k Zero - Shot Top - 1 Acc. (%)	67.8 (MobileCLIP - S0); 72.6 (MobileCLIP - S1); 74.4 (MobileCLIP - S2); 76.8 (MobileCLIP - B); 77.2 (MobileCLIP - B (LT))
Avg. Perf. (%) on 38 datasets	58.1 (MobileCLIP - S0); 61.3 (MobileCLIP - S1); 63.7 (MobileCLIP - S2); 65.2 (MobileCLIP - B); 65.8 (MobileCLIP - B (LT))

📄 License

This project is under the apple - amlr license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご