AIMv2-Large-Patch14-Native開源圖像分類模型 - 免費可用精準識別圖像類別

首頁

Aimv2 Large Patch14 Native Image Classification

由amaye15開發

AIMv2-Large-Patch14-Native 是一個適配後的圖像分類模型，基於原始AIMv2模型修改，兼容Hugging Face Transformers的AutoModelForImageClassification類。

圖像分類

Transformers

開源協議:MIT #多模態預訓練 #開放詞彙分類 #高精度視覺識別

下載量 15

發布時間 : 11/25/2024

模型概述

本模型是原始AIMv2模型的適配版本，經過修改可與Hugging Face Transformers的AutoModelForImageClassification類兼容，用於圖像分類任務。

模型特點

多模態自迴歸預訓練

AIMv2模型通過多模態自迴歸目標進行預訓練，在各種基準測試中展現出卓越性能。

兼容Hugging Face Transformers

經過適配後，該模型可直接與AutoModelForImageClassification配合使用，便於集成到現有工作流中。

高性能

AIMv2系列在多數多模態理解基準測試中超越OAI CLIP和SigLIP，在開放詞彙目標檢測和指代表達理解任務上優於DINOv2。

模型能力

圖像分類

視覺理解

使用案例

計算機視覺

通用圖像分類

對輸入圖像進行分類，識別其中的主要對象或場景。

🚀 AIMv2-Large-Patch14-Native圖像分類

本項目提供了原始AIMv2模型的適配版本，使其能夠與Hugging Face Transformers庫中的AutoModelForImageClassification類兼容，從而可以無縫地用於圖像分類任務。

原始AIMv2論文 | BibTeX引用

該模型尚未經過訓練/微調

🚀 快速開始

本倉庫包含了原始AIMv2模型的一個適配版本，該版本經過修改後可與Hugging Face Transformers庫中的AutoModelForImageClassification類兼容，從而能夠無縫地將模型用於圖像分類任務。

✨ 主要特性

我們對原始的apple/aimv2-large-patch14-native模型進行了適配，使其能夠與AutoModelForImageClassification協同工作。AIMv2系列是基於多模態自迴歸目標進行預訓練的視覺模型，在多個基準測試中表現出色。

AIMv2模型的一些亮點包括：

在大多數多模態理解基準測試中超越了OAI CLIP和SigLIP。
在開放詞彙對象檢測和指代表達理解方面優於DINOv2。
展現出強大的識別性能，AIMv2 - 3B在使用凍結主幹的情況下在ImageNet上達到了**89.5%**的準確率。

💻 使用示例

基礎用法

import requests
from PIL import Image
from transformers import AutoImageProcessor, AutoModelForImageClassification

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

processor = AutoImageProcessor.from_pretrained(
    "amaye15/aimv2-large-patch14-native-image-classification",
)
model = AutoModelForImageClassification.from_pretrained(
    "amaye15/aimv2-large-patch14-native-image-classification",
    trust_remote_code=True,
)

inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)

# Get predicted class
predictions = outputs.logits.softmax(dim=-1)
predicted_class = predictions.argmax(-1).item()

print(f"Predicted class: {model.config.id2label[predicted_class]}")

📚 詳細文檔

模型詳情

屬性	詳情
模型名稱	`amaye15/aimv2-large-patch14-native-image-classification`
原始模型	`apple/aimv2-large-patch14-native`
適配情況	經過修改以兼容`AutoModelForImageClassification`，可直接用於圖像分類任務
框架	PyTorch

📄 許可證

本項目採用MIT許可證。

📄 引用

如果您使用了該模型或發現它很有幫助，請考慮引用原始的AIMv2論文：

@article{yang2023aimv2,
  title={AIMv2: Advances in Multimodal Vision Models},
  author={Yang, Li and others},
  journal={arXiv preprint arXiv:2411.14402},
  year={2023}
}