Aimv2 Large Patch14 Native
A
Aimv2 Large Patch14 Native
Developed by apple
AIMv2 is a series of vision models pretrained with multimodal autoregressive objectives, achieving outstanding performance in multiple multimodal understanding benchmarks.
Downloads 788
Release Time : 11/21/2024
Model Overview
AIMv2 demonstrates exceptional performance in image feature extraction and multimodal understanding tasks through multimodal autoregressive pretraining.
Model Features
Exceptional Multimodal Understanding
Outperforms OpenAI CLIP and SigLIP models in most multimodal understanding benchmarks
Powerful Recognition Performance
AIMv2-3B achieves 89.5% accuracy on ImageNet when using frozen backbone
Open Vocabulary Advantage
Surpasses DINOv2 in open vocabulary object detection and referring expression comprehension
Efficient Pretraining Method
Employs straightforward multimodal autoregressive objectives that effectively scale training
Model Capabilities
Image Feature Extraction
Multimodal Understanding
Open Vocabulary Object Detection
Referring Expression Comprehension
Large-scale Visual Representation Learning
Use Cases
Computer Vision
Image Classification
Using pretrained features for image classification tasks
Achieves 89.5% accuracy on ImageNet
Object Detection
Object detection in open vocabulary settings
Outperforms DINOv2 model
Multimodal Applications
Vision-Language Understanding
Joint representation learning of images and text
Surpasses CLIP and SigLIP models
Featured Recommended AI Models
ยฉ 2025AIbase