AIMv2-Large-Patch14-Native Open-Source Image Classification Model - Free to Use and Accurately Identify Image Categories

Aimv2 Large Patch14 Native Image Classification

Developed by amaye15

AIMv2-Large-Patch14-Native is an adapted image classification model, modified from the original AIMv2 model to be compatible with Hugging Face Transformers' AutoModelForImageClassification class.

Image Classification

Transformers

Open Source License:MIT #Multimodal Pre-training #Open Vocabulary Classification #High-precision Visual Recognition

Downloads 15

Release Time : 11/25/2024

Model Overview

This model is an adapted version of the original AIMv2 model, modified to be compatible with Hugging Face Transformers' AutoModelForImageClassification class for image classification tasks.

Model Features

Multimodal Autoregressive Pre-training

The AIMv2 model is pre-trained with multimodal autoregressive objectives, demonstrating outstanding performance across various benchmarks.

Compatible with Hugging Face Transformers

After adaptation, this model can be directly used with AutoModelForImageClassification, making it easy to integrate into existing workflows.

High Performance

The AIMv2 series outperforms OAI CLIP and SigLIP in most multimodal understanding benchmarks and surpasses DINOv2 in open-vocabulary object detection and referring expression comprehension tasks.

Model Capabilities

Image Classification

Visual Understanding

Use Cases

Computer Vision

General Image Classification

Classify input images to identify the main objects or scenes within them.

🚀 AIMv2-Large-Patch14-Native Image Classification

This repository provides an adapted AIMv2 model for seamless image classification using Hugging Face Transformers.

🚀 Quick Start

This repository contains an adapted version of the original AIMv2 model, modified to be compatible with the AutoModelForImageClassification class from Hugging Face Transformers. This adaptation enables seamless use of the model for image classification tasks.

Note: This model has not been trained/fine-tuned.

✨ Features

Adapted Compatibility: We have adapted the original apple/aimv2-large-patch14-native model to work with AutoModelForImageClassification.
Strong Performance: The AIMv2 family consists of vision models pre - trained with a multimodal autoregressive objective, offering robust performance across various benchmarks.
- Outperforming OAI CLIP and SigLIP on the majority of multimodal understanding benchmarks.
- Surpassing DINOv2 in open - vocabulary object detection and referring expression comprehension.
- Demonstrating strong recognition performance, with AIMv2 - 3B achieving 89.5% on ImageNet using a frozen trunk.

💻 Usage Examples

Basic Usage

import requests
from PIL import Image
from transformers import AutoImageProcessor, AutoModelForImageClassification

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

processor = AutoImageProcessor.from_pretrained(
    "amaye15/aimv2-large-patch14-native-image-classification",
)
model = AutoModelForImageClassification.from_pretrained(
    "amaye15/aimv2-large-patch14-native-image-classification",
    trust_remote_code=True,
)

inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)

# Get predicted class
predictions = outputs.logits.softmax(dim=-1)
predicted_class = predictions.argmax(-1).item()

print(f"Predicted class: {model.config.id2label[predicted_class]}")

📚 Documentation

Model Details

Property	Details
Model Name	`amaye15/aimv2-large-patch14-native-image-classification`
Original Model	`apple/aimv2-large-patch14-native`
Adaptation	Modified to be compatible with `AutoModelForImageClassification` for direct use in image classification tasks.
Framework	PyTorch

Citation

If you use this model or find it helpful, please consider citing the original AIMv2 paper:

@article{yang2023aimv2,
  title={AIMv2: Advances in Multimodal Vision Models},
  author={Yang, Li and others},
  journal={arXiv preprint arXiv:2411.14402},
  year={2023}
}

📄 License

This project is licensed under the MIT License.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご