MobileCLIP-B-OpenCLIP Open-Source Image-Text Model: Multi-Modal Training Empowers Fast, Accurate, and Stable Zero-Shot Image Classification

Mobileclip B OpenCLIP

Developed by apple

MobileCLIP-B is an efficient image-text model that achieves fast inference through multimodal reinforcement training and excels in zero-shot image classification tasks.

Text-to-Image

Safetensors

#Zero-shot image classification #Low-latency inference #Multimodal reinforcement training

Downloads 715

Release Time : 6/7/2024

Model Overview

MobileCLIP is a fast image-text model specifically designed for efficient zero-shot image classification. Through multimodal reinforcement training methods, it achieves performance comparable to larger models while maintaining a compact size.

Model Features

Efficient Performance

Achieves performance comparable to larger models while maintaining a compact size

Fast Inference

Total image + text processing latency of only 13.7ms (MobileCLIP-B)

Multimodal Training

Employs multimodal reinforcement training methods to enhance model performance

Zero-shot Capability

Demonstrates strong zero-shot classification ability on unseen categories

Model Capabilities

Zero-shot image classification

Image-text matching

Multimodal understanding

Use Cases

Computer Vision

Image Classification

Classifies images without specific training

Achieves 76.8% zero-shot accuracy on ImageNet-1k

Image-Text Retrieval

Retrieves relevant images based on text descriptions

Mobile Applications

Mobile Visual Search

Implements efficient visual search functionality on mobile devices

Property	Details
Model Type	[MobileCLIP - S0](https://hf.co/pcuenq/MobileCLIP - S0), [MobileCLIP - S1](https://hf.co/pcuenq/MobileCLIP - S1), [MobileCLIP - S2](https://hf.co/pcuenq/MobileCLIP - S2), [MobileCLIP - B](https://hf.co/pcuenq/MobileCLIP - B), [MobileCLIP - B (LT)](https://hf.co/pcuenq/MobileCLIP - B - LT)
# Seen Samples (B)	13 (for S0, S1, S2, B); 36 (for B (LT))
# Params (M) (img + txt)	11.4 + 42.4 (S0); 21.5 + 63.4 (S1); 35.7 + 63.4 (S2); 86.3 + 63.4 (B, B (LT))
Latency (ms) (img + txt)	1.5 + 1.6 (S0); 2.5 + 3.3 (S1); 3.6 + 3.3 (S2); 10.4 + 3.3 (B, B (LT))
IN - 1k Zero - Shot Top - 1 Acc. (%)	67.8 (S0); 72.6 (S1); 74.4 (S2); 76.8 (B); 77.2 (B (LT))
Avg. Perf. (%) on 38 datasets	58.1 (S0); 61.3 (S1); 63.7 (S2); 65.2 (B); 65.8 (B (LT))

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Mobileclip B OpenCLIP

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

✨ Features

📚 Documentation

Checkpoints

📄 License