vit_base_patch16_224.dinoオープンソース画像モデル - 画像分類と特徴抽出に無料で使用可能

ホーム

Vit Base Patch16 224.dino

timmによって開発

自己教師ありDINO手法で訓練されたVision Transformer（ViT）画像特徴モデルで、画像分類や特徴抽出タスクに適しています。

画像分類

Transformers

オープンソースライセンス:Apache-2.0 #自己教師あり学習 #画像特徴抽出 #ビジョントランスフォーマー

ダウンロード数 33.45k

リリース時間 : 12/22/2022

モデル概要

このモデルはDINO自己教師あり学習手法で訓練されたVision Transformerで、主に画像分類や特徴抽出のバックボーンネットワークとして使用されます。

モデル特徴

自己教師あり学習

DINO手法を用いた自己教師あり訓練により、大量の注釈データなしで効果的な視覚表現を学習できます。

Vision Transformerアーキテクチャ

標準的なViT-B/16アーキテクチャを採用し、画像を16x16のパッチに分割して処理します。

効率的な特徴抽出

特徴抽出のバックボーンネットワークとして使用可能で、768次元の特徴ベクトルを出力します。

モデル能力

画像分類

画像特徴抽出

視覚表現学習

使用事例

コンピュータビジョン

画像分類

画像を分類し、ImageNet-1kのクラス確率を出力します。

特徴抽出

画像の高レベルな特徴表現を抽出し、物体検出や画像検索などの下流タスクに利用できます。

🚀 vit_base_patch16_224.dinoのモデルカード

Vision Transformer (ViT) の画像特徴抽出モデルです。自己教師付き学習のDINO手法で学習されています。

🚀 クイックスタート

このモデルは、画像分類や特徴抽出に使用できるVision Transformerモデルです。以下に具体的な使用例を示します。

✨ 主な機能

画像分類タスクに適用可能です。
画像の埋め込み表現を取得することができます。

📦 インストール

このモデルを使用するには、timmライブラリが必要です。以下のコマンドでインストールできます。

pip install timm

💻 使用例

基本的な使用法

画像分類

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('vit_base_patch16_224.dino', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

画像の埋め込み表現

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'vit_base_patch16_224.dino',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 197, 768) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

📚 ドキュメント

モデルの詳細

属性	详情
モデルタイプ	画像分類 / 特徴抽出バックボーン
モデル統計量	パラメータ数 (M): 85.8 GMACs: 16.9 活性化関数の出力数 (M): 16.5 画像サイズ: 224 x 224
関連論文	Emerging Properties in Self-Supervised Vision Transformers: https://arxiv.org/abs/2104.14294 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: https://arxiv.org/abs/2010.11929v2
事前学習データセット	ImageNet-1k
オリジナルリポジトリ	https://github.com/facebookresearch/dino

モデル比較

timmのモデル結果で、このモデルのデータセットと実行時間のメトリクスを確認できます。

📄 ライセンス

このモデルはApache-2.0ライセンスの下で提供されています。

🔖 引用

@inproceedings{caron2021emerging,
  title={Emerging properties in self-supervised vision transformers},
  author={Caron, Mathilde and Touvron, Hugo and Misra, Ishan and J{'e}gou, Herv{'e} and Mairal, Julien and Bojanowski, Piotr and Joulin, Armand},
  booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
  pages={9650--9660},
  year={2021}
}

@article{dosovitskiy2020vit,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and  Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
  journal={ICLR},
  year={2021}
}

@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}