đ MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
MobileCLIP is a fast image - text model developed through multi - modal reinforced training. It offers high - performance zero - shot image classification with different variants that balance speed, size, and accuracy.
This repository contains the MobileCLIP - S1 checkpoint for OpenCLIP.

đ Quick Start
MobileCLIP was introduced in MobileCLIP: Fast Image - Text Models through Multi - Modal Reinforced Training (CVPR 2024), by Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel.
⨠Features
Highlights
- Our smallest variant
MobileCLIP - S0
obtains similar zero - shot performance as OpenAI's ViT - B/16 model while being 4.8x faster and 2.8x smaller.
MobileCLIP - S2
obtains better avg zero - shot performance than SigLIP's ViT - B/16 model while being 2.3x faster and 2.1x smaller, and trained with 3x less seen samples.
MobileCLIP - B
(LT) attains zero - shot ImageNet performance of 77.2% which is significantly better than recent works like DFN and SigLIP with similar architectures or even OpenAI's ViT - L/14@336.
đ Documentation
Checkpoints
Property |
Details |
Model Type |
MobileCLIP has several variants including MobileCLIP - S0 , MobileCLIP - S1 , MobileCLIP - S2 , MobileCLIP - B , and MobileCLIP - B (LT) |
# Seen Samples (B) |
Ranges from 13 to 36 |
# Params (M) (img + txt) |
Varies depending on the model, e.g., 11.4 + 42.4 for MobileCLIP - S0 |
Latency (ms) (img + txt) |
Different for each model, e.g., 1.5 + 1.6 for MobileCLIP - S0 |
IN - 1k Zero - Shot Top - 1 Acc. (%) |
Ranges from 67.8% to 77.2% |
Avg. Perf. (%) on 38 datasets |
Ranges from 58.1% to 65.8% |
Model |
# Seen Samples (B) |
# Params (M) (img + txt) |
Latency (ms) (img + txt) |
IN - 1k Zero - Shot Top - 1 Acc. (%) |
Avg. Perf. (%) on 38 datasets |
[MobileCLIP - S0](https://hf.co/pcuenq/MobileCLIP - S0) |
13 |
11.4 + 42.4 |
1.5 + 1.6 |
67.8 |
58.1 |
[MobileCLIP - S1](https://hf.co/pcuenq/MobileCLIP - S1) |
13 |
21.5 + 63.4 |
2.5 + 3.3 |
72.6 |
61.3 |
[MobileCLIP - S2](https://hf.co/pcuenq/MobileCLIP - S2) |
13 |
35.7 + 63.4 |
3.6 + 3.3 |
74.4 |
63.7 |
[MobileCLIP - B](https://hf.co/pcuenq/MobileCLIP - B) |
13 |
86.3 + 63.4 |
10.4 + 3.3 |
76.8 |
65.2 |
[MobileCLIP - B (LT)](https://hf.co/pcuenq/MobileCLIP - B - LT) |
36 |
86.3 + 63.4 |
10.4 + 3.3 |
77.2 |
65.8 |
đ License
This project is under the [apple - amlr](https://github.com/apple/ml - mobileclip/blob/main/LICENSE_weights_data) license.