vit_small_patch16_224.dinoオープンソース画像特徴モデル

ホーム

Vit Small Patch16 224.dino

timmによって開発

視覚Transformer（ViT）ベースの画像特徴モデルで、自己教師ありDINO手法でトレーニングされ、画像分類や特徴抽出タスクに適しています。

画像分類

Transformers

オープンソースライセンス:Apache-2.0 #自己教師ありViT #小規模パラメータViT #画像特徴抽出

ダウンロード数 70.62k

リリース時間 : 12/22/2022

モデル概要

このモデルは視覚Transformer（ViT）ベースの画像特徴モデルで、自己教師ありDINO手法でトレーニングされています。主に画像分類や特徴バックボーンネットワークとして使用され、様々なコンピュータビジョンタスクに適用可能です。

モデル特徴

自己教師あり学習

DINO自己教師あり学習手法を採用しており、大量のアノテーションデータがなくても効果的な視覚表現を学習できます。

効率的なアーキテクチャ

Vision Transformerアーキテクチャベースで、パラメータ数は21.7M、GMACs演算量は4.3で、中規模計算ニーズに適しています。

マルチタスクサポート

画像分類だけでなく、特徴抽出バックボーンネットワークとしても使用可能で、様々な下流コンピュータビジョンタスクをサポートします。

モデル能力

画像特徴抽出

画像分類

コンピュータビジョンタスクサポート

使用事例

コンピュータビジョン

画像分類

入力画像を分類し、クラス確率分布を出力します。

ImageNet-1kデータセットで良好な性能を発揮

特徴抽出

画像の深層特徴表現を抽出し、物体検出や画像検索などの下流タスクに利用可能です。

384次元特徴ベクトルを提供

🚀 vit_small_patch16_224.dino のモデルカード

Vision Transformer (ViT) の画像特徴抽出モデルです。自己教師付き学習のDINO手法を用いて学習されています。

🚀 クイックスタート

このモデルは、画像分類や画像埋め込みのタスクに使用できます。以下のセクションで具体的な使用方法を説明します。

✨ 主な機能

画像分類や画像特徴抽出に適したバックボーンモデルです。
自己教師付き学習のDINO手法を用いて学習されています。

📦 インストール

このモデルを使用するには、timm ライブラリをインストールする必要があります。

pip install timm

💻 使用例

基本的な使用法

画像分類

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('vit_small_patch16_224.dino', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

画像埋め込み

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'vit_small_patch16_224.dino',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 197, 384) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

📚 ドキュメント

モデル詳細

属性	详情
モデルタイプ	画像分類 / 特徴抽出バックボーン
パラメータ数 (M)	21.7
GMACs	4.3
アクティベーション数 (M)	8.2
画像サイズ	224 x 224
論文	Emerging Properties in Self-Supervised Vision Transformers: https://arxiv.org/abs/2104.14294 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: https://arxiv.org/abs/2010.11929v2
事前学習データセット	ImageNet-1k
オリジナルリポジトリ	https://github.com/facebookresearch/dino

モデル比較

このモデルのデータセットと実行時間のメトリクスについては、timmのモデル結果を参照してください。

📄 ライセンス

このモデルは Apache-2.0 ライセンスの下で提供されています。

引用

@inproceedings{caron2021emerging,
  title={Emerging properties in self-supervised vision transformers},
  author={Caron, Mathilde and Touvron, Hugo and Misra, Ishan and J{'e}gou, Herv{'e} and Mairal, Julien and Bojanowski, Piotr and Joulin, Armand},
  booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
  pages={9650--9660},
  year={2021}
}

@article{dosovitskiy2020vit,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and  Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
  journal={ICLR},
  year={2021}
}

@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}