A

Aimv2 Large Patch14 224 Lit

Developed by apple
AIMv2 is a series of vision models pretrained with multimodal autoregressive objectives, demonstrating outstanding performance across multiple multimodal understanding benchmarks.
Downloads 222
Release Time : 11/20/2024

Model Overview

AIMv2 is pretrained with multimodal autoregressive objectives, showcasing robust performance in tasks such as image classification and object detection.

Model Features

Multimodal Autoregressive Pretraining
Utilizes innovative autoregressive objectives for pretraining to achieve superior multimodal understanding capabilities.
Exceptional Benchmark Performance
Outperforms OpenAI CLIP and SigLIP models in most multimodal understanding benchmarks.
Powerful Recognition Performance
The 3B version achieves 89.5% accuracy on ImageNet when using a frozen backbone network.
Broad Application Capabilities
Surpasses DINOv2 in open-vocabulary object detection and referring expression comprehension tasks.

Model Capabilities

Zero-shot image classification
Multimodal understanding
Open-vocabulary object detection
Referring expression comprehension

Use Cases

Computer Vision
Image Classification
Classify and recognize image content
89.5% accuracy on ImageNet
Object Detection
Detect specific objects in images
Outperforms DINOv2 model
Multimodal Applications
Image-Text Matching
Understand the relationship between images and text descriptions
Surpasses CLIP and SigLIP models
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase