samvit_huge_patch16.sa1bオープンソース画像特徴モデル - 無料で特徴抽出と微調整を実現！

ホーム

Samvit Huge Patch16.sa1b

timmによって開発

Segment-Anythingビジョントランスフォーマー（SAM ViT）画像特徴モデル、特徴抽出とファインチューニング機能のみを含み、セグメンテーションヘッドは含まれません。

画像セグメンテーション

Transformers

オープンソースライセンス:Apache-2.0 #SA-1B事前学習 #ViTバックボーンネットワーク #1024高解像度

ダウンロード数 131

リリース時間 : 5/18/2023

モデル概要

このモデルはSA-1Bデータセットで事前学習されたビジョントランスフォーマーで、主に画像特徴抽出とファインチューニングに使用され、特にセグメンテーションタスクに適しています。

モデル特徴

大規模事前学習

SA-1Bデータセットに基づく事前学習で、強力な特徴抽出能力を有する

効率的なアーキテクチャ

ビジョントランスフォーマー(ViT)アーキテクチャを採用し、1024x1024解像度の画像を処理

多機能アプリケーション

画像分類に使用できるだけでなく、特徴抽出バックボーンネットワークとしても利用可能

モデル能力

画像特徴抽出

画像分類

画像埋め込み生成

使用事例

コンピュータビジョン

画像分類

このモデルを使用して画像分類タスクを実行

特徴抽出

下流タスクのための画像特徴を抽出するバックボーンネットワークとして使用

🚀 samvit_huge_patch16.sa1b のモデルカード

Segment-Anything Vision Transformer (SAM ViT) の画像特徴モデルです（注意: 特徴抽出と微調整用で、セグメンテーションヘッドは含まれません）。論文の著者により、MAEの重みを初期値としてSA-1Bデータセットでセグメンテーションのために事前学習されています。

🚀 クイックスタート

このモデルは画像分類や画像埋め込みのタスクに使用できます。以下のセクションで具体的な使用方法を説明します。

✨ 主な機能

画像分類と特徴抽出のバックボーンとして機能します。
SA-1Bデータセットで事前学習されています。

📦 インストール

このモデルを使用するには、timm ライブラリが必要です。以下のコマンドでインストールできます。

pip install timm

💻 使用例

基本的な使用法

画像分類

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('samvit_huge_patch16.sa1b', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

画像埋め込み

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'samvit_huge_patch16.sa1b',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 256, 64, 64) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

📚 ドキュメント

モデル詳細

属性	详情
モデルタイプ	画像分類 / 特徴バックボーン
モデル統計量	- パラメータ数 (M): 637.0 - GMACs: 2982.2 - 活性化関数の出力 (M): 3428.2 - 画像サイズ: 1024 x 1024
論文	- Segment Anything: https://arxiv.org/abs/2304.02643 - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: https://arxiv.org/abs/2010.11929v2
オリジナル	https://github.com/facebookresearch/segment-anything
事前学習データセット	SA-1B

モデル比較

このモデルのデータセットと実行時間のメトリクスについては、timmのモデル結果を参照してください。

引用

@article{kirillov2023segany,
  title={Segment Anything},
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}

@article{dosovitskiy2020vit,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and  Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
  journal={ICLR},
  year={2021}
}

@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}