AIMv2-Large-Patch14-Native开源图像分类模型 - 免费可用精准识别图像类别

首页

Aimv2 Large Patch14 Native Image Classification

由 amaye15 开发

AIMv2-Large-Patch14-Native 是一个适配后的图像分类模型，基于原始AIMv2模型修改，兼容Hugging Face Transformers的AutoModelForImageClassification类。

图像分类

Transformers

开源协议:MIT #多模态预训练 #开放词汇分类 #高精度视觉识别

下载量 15

发布时间 : 11/25/2024

模型简介

本模型是原始AIMv2模型的适配版本，经过修改可与Hugging Face Transformers的AutoModelForImageClassification类兼容，用于图像分类任务。

模型特点

多模态自回归预训练

AIMv2模型通过多模态自回归目标进行预训练，在各种基准测试中展现出卓越性能。

兼容Hugging Face Transformers

经过适配后，该模型可直接与AutoModelForImageClassification配合使用，便于集成到现有工作流中。

高性能

AIMv2系列在多数多模态理解基准测试中超越OAI CLIP和SigLIP，在开放词汇目标检测和指代表达理解任务上优于DINOv2。

模型能力

图像分类

视觉理解

使用案例

计算机视觉

通用图像分类

对输入图像进行分类，识别其中的主要对象或场景。

🚀 AIMv2-Large-Patch14-Native图像分类

本项目提供了原始AIMv2模型的适配版本，使其能够与Hugging Face Transformers库中的AutoModelForImageClassification类兼容，从而可以无缝地用于图像分类任务。

原始AIMv2论文 | BibTeX引用

该模型尚未经过训练/微调

🚀 快速开始

本仓库包含了原始AIMv2模型的一个适配版本，该版本经过修改后可与Hugging Face Transformers库中的AutoModelForImageClassification类兼容，从而能够无缝地将模型用于图像分类任务。

✨ 主要特性

我们对原始的apple/aimv2-large-patch14-native模型进行了适配，使其能够与AutoModelForImageClassification协同工作。AIMv2系列是基于多模态自回归目标进行预训练的视觉模型，在多个基准测试中表现出色。

AIMv2模型的一些亮点包括：

在大多数多模态理解基准测试中超越了OAI CLIP和SigLIP。
在开放词汇对象检测和指代表达理解方面优于DINOv2。
展现出强大的识别性能，AIMv2 - 3B在使用冻结主干的情况下在ImageNet上达到了**89.5%**的准确率。

💻 使用示例

基础用法

import requests
from PIL import Image
from transformers import AutoImageProcessor, AutoModelForImageClassification

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

processor = AutoImageProcessor.from_pretrained(
    "amaye15/aimv2-large-patch14-native-image-classification",
)
model = AutoModelForImageClassification.from_pretrained(
    "amaye15/aimv2-large-patch14-native-image-classification",
    trust_remote_code=True,
)

inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)

# Get predicted class
predictions = outputs.logits.softmax(dim=-1)
predicted_class = predictions.argmax(-1).item()

print(f"Predicted class: {model.config.id2label[predicted_class]}")

📚 详细文档

模型详情

属性	详情
模型名称	`amaye15/aimv2-large-patch14-native-image-classification`
原始模型	`apple/aimv2-large-patch14-native`
适配情况	经过修改以兼容`AutoModelForImageClassification`，可直接用于图像分类任务
框架	PyTorch

📄 许可证

本项目采用MIT许可证。

📄 引用

如果您使用了该模型或发现它很有帮助，请考虑引用原始的AIMv2论文：

@article{yang2023aimv2,
  title={AIMv2: Advances in Multimodal Vision Models},
  author={Yang, Li and others},
  journal={arXiv preprint arXiv:2411.14402},
  year={2023}
}