🚀 vit_giant_patch14_reg4_dinov2.lvd142m
このモデルは、Vision Transformer (ViT) の画像特徴抽出モデルで、レジスタを備えています。自己教師付き学習のDINOv2手法を用いて、LVD - 142Mデータセットで事前学習されています。
📚 ドキュメント
モデルの詳細
属性 |
详情 |
モデルタイプ |
画像分類 / 特徴抽出バックボーン |
モデル統計情報 |
パラメータ数 (M): 1136.5 GMACs: 1558.1 アクティベーション (M): 874.4 画像サイズ: 518 x 518 |
関連論文 |
Vision Transformers Need Registers: https://arxiv.org/abs/2309.16588 DINOv2: Learning Robust Visual Features without Supervision: https://arxiv.org/abs/2304.07193 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: https://arxiv.org/abs/2010.11929v2 |
オリジナルリポジトリ |
https://github.com/facebookresearch/dinov2 |
事前学習データセット |
LVD - 142M |
モデルの使用方法
基本的な使用法
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model('vit_giant_patch14_reg4_dinov2.lvd142m', pretrained=True)
model = model.eval()
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0))
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
高度な使用法
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model(
'vit_giant_patch14_reg4_dinov2.lvd142m',
pretrained=True,
num_classes=0,
)
model = model.eval()
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0))
output = model.forward_features(transforms(img).unsqueeze(0))
output = model.forward_head(output, pre_logits=True)
モデルの比較
timmのモデル結果で、このモデルのデータセットと実行時のメトリクスを確認できます。
引用
@article{darcet2023vision,
title={Vision Transformers Need Registers},
author={Darcet, Timoth{'e}e and Oquab, Maxime and Mairal, Julien and Bojanowski, Piotr},
journal={arXiv preprint arXiv:2309.16588},
year={2023}
}
@misc{oquab2023dinov2,
title={DINOv2: Learning Robust Visual Features without Supervision},
author={Oquab, Maxime and Darcet, Timothée and Moutakanni, Theo and Vo, Huy V. and Szafraniec, Marc and Khalidov, Vasil and Fernandez, Pierre and Haziza, Daniel and Massa, Francisco and El-Nouby, Alaaeldin and Howes, Russell and Huang, Po-Yao and Xu, Hu and Sharma, Vasu and Li, Shang-Wen and Galuba, Wojciech and Rabbat, Mike and Assran, Mido and Ballas, Nicolas and Synnaeve, Gabriel and Misra, Ishan and Jegou, Herve and Mairal, Julien and Labatut, Patrick and Joulin, Armand and Bojanowski, Piotr},
journal={arXiv:2304.07193},
year={2023}
}
@article{dosovitskiy2020vit,
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
journal={ICLR},
year={2021}
}
@misc{rw2019timm,
author = {Ross Wightman},
title = {PyTorch Image Models},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
doi = {10.5281/zenodo.4414861},
howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}
📄 ライセンス
このモデルはApache - 2.0ライセンスの下で提供されています。