Aimv2 1B Patch14 224
AIMv2 is a series of vision models pretrained with multimodal autoregressive objectives, excelling in various vision tasks.
Downloads 299
Release Time : 10/29/2024
Model Overview
AIMv2 is a large-scale visual encoder based on multimodal autoregressive pretraining, primarily used for image feature extraction and classification tasks. The model demonstrates outstanding performance across multiple benchmarks, particularly surpassing peer models in open-vocabulary object detection and referring expression comprehension tasks.
Model Features
Multimodal Autoregressive Pretraining
Utilizes innovative multimodal autoregressive objectives for pretraining, effectively enhancing model performance.
Exceptional Classification Performance
Achieves over 88% accuracy on classification tasks such as ImageNet-1k.
Cross-Task Generalization Capability
Outperforms DINOv2 in open-vocabulary object detection and referring expression comprehension tasks.
Model Capabilities
Image feature extraction
Image classification
Multimodal understanding
Open-vocabulary object detection
Referring expression comprehension
Use Cases
Computer Vision
General Image Classification
Used for standard image classification tasks such as ImageNet.
88.1% accuracy on ImageNet-1k
Fine-Grained Classification
Used for fine-grained image classification, such as the Stanford Cars dataset.
96.5% accuracy on the Stanford Cars dataset
Medical Image Analysis
Used for medical image classification, such as the Camelyon17 dataset.
94.2% accuracy on Camelyon17
Remote Sensing Image Processing
Satellite Image Classification
Used for satellite image classification tasks such as EuroSAT.
98.8% accuracy on EuroSAT
Featured Recommended AI Models
ยฉ 2025AIbase