🚀 vit_small_patch16_224.dino模型卡
這是一個視覺變換器(ViT)圖像特徵模型,採用自監督DINO方法進行訓練,可用於圖像特徵提取等任務。
🚀 快速開始
本模型是一個視覺變換器(ViT)圖像特徵模型,使用自監督DINO方法進行訓練,可用於圖像分類和特徵提取。
✨ 主要特性
- 採用自監督DINO方法訓練,能有效學習圖像特徵。
- 可用於圖像分類和圖像嵌入提取任務。
📦 安裝指南
文檔未提及安裝步驟,可參考timm
庫的官方安裝說明進行安裝。
💻 使用示例
基礎用法
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model('vit_small_patch16_224.dino', pretrained=True)
model = model.eval()
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0))
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
高級用法
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model(
'vit_small_patch16_224.dino',
pretrained=True,
num_classes=0,
)
model = model.eval()
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0))
output = model.forward_features(transforms(img).unsqueeze(0))
output = model.forward_head(output, pre_logits=True)
📚 詳細文檔
模型詳情
屬性 |
詳情 |
模型類型 |
圖像分類 / 特徵骨幹網絡 |
模型統計信息 |
參數數量(M):21.7;GMACs:4.3;激活值數量(M):8.2;圖像大小:224 x 224 |
相關論文 |
Emerging Properties in Self-Supervised Vision Transformers: https://arxiv.org/abs/2104.14294;An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: https://arxiv.org/abs/2010.11929v2 |
預訓練數據集 |
ImageNet-1k |
原始代碼庫 |
https://github.com/facebookresearch/dino |
模型比較
可在timm 模型結果中查看該模型的數據集和運行時指標。
📄 許可證
本項目採用Apache-2.0許可證。
📚 引用
@inproceedings{caron2021emerging,
title={Emerging properties in self-supervised vision transformers},
author={Caron, Mathilde and Touvron, Hugo and Misra, Ishan and J{'e}gou, Herv{'e} and Mairal, Julien and Bojanowski, Piotr and Joulin, Armand},
booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
pages={9650--9660},
year={2021}
}
@article{dosovitskiy2020vit,
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
journal={ICLR},
year={2021}
}
@misc{rw2019timm,
author = {Ross Wightman},
title = {PyTorch Image Models},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
doi = {10.5281/zenodo.4414861},
howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}