🚀 vit_base_patch16_224.orig_in21k 模型卡
這是一個視覺變換器(ViT)圖像分類模型。由論文作者在JAX中基於ImageNet - 21k數據集進行預訓練,並由Ross Wightman將其移植到PyTorch。該模型沒有分類頭,僅適用於特徵提取和微調。
🚀 快速開始
本模型是一個視覺變換器(ViT)圖像分類模型,在圖像特徵提取和分類任務中表現出色。它基於ImageNet - 21k數據集進行預訓練,可用於圖像分類和圖像嵌入提取等任務。
✨ 主要特性
- 模型類型:圖像分類/特徵骨幹網絡
- 模型統計信息:
- 參數數量(M):85.8
- GMACs:16.9
- 激活值(M):16.5
- 圖像尺寸:224 x 224
- 相關論文:
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale:https://arxiv.org/abs/2010.11929v2
- 數據集:ImageNet - 21k
- 原始代碼庫:https://github.com/google-research/vision_transformer
📦 安裝指南
文檔未提及安裝步驟,跳過該章節。
💻 使用示例
基礎用法
圖像分類
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model('vit_base_patch16_224.orig_in21k', pretrained=True)
model = model.eval()
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0))
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
高級用法
圖像嵌入提取
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model(
'vit_base_patch16_224.orig_in21k',
pretrained=True,
num_classes=0,
)
model = model.eval()
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0))
output = model.forward_features(transforms(img).unsqueeze(0))
output = model.forward_head(output, pre_logits=True)
📚 詳細文檔
可在timm 模型結果中探索該模型的數據集和運行時指標。
📄 許可證
本項目採用Apache - 2.0許可證。
🔗 引用
@article{dosovitskiy2020vit,
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
journal={ICLR},
year={2021}
}
@misc{rw2019timm,
author = {Ross Wightman},
title = {PyTorch Image Models},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
doi = {10.5281/zenodo.4414861},
howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}