samvit_base_patch16.sa1bオープンソース画像特徴モデル - 特徴抽出と微調整を無料で実現

ホーム

Samvit Base Patch16.sa1b

timmによって開発

Segment-Anythingビジュアルトランスフォーマー（SAM ViT）画像特徴モデル、特徴抽出とファインチューニング機能のみを含み、分割ヘッドは含まれません。

画像セグメンテーション

Transformers

オープンソースライセンス:Apache-2.0 #画像分割特徴抽出 #SA-1B事前学習 #ViTバックボーンネットワーク

ダウンロード数 2,756

リリース時間 : 5/18/2023

モデル概要

このモデルはビジュアルトランスフォーマー（ViT）アーキテクチャに基づく画像特徴抽出モデルで、主に画像分類と特徴抽出タスクに使用されます。SA-1BデータセットでMAE重み初期化により事前学習されており、分割タスクに適しています。

モデル特徴

効率的な特徴抽出

このモデルは画像特徴抽出に特化しており、様々な下流視覚タスクに適用可能です。

ビジュアルトランスフォーマーアーキテクチャ採用

先進的なビジュアルトランスフォーマー（ViT）アーキテクチャを採用し、高解像度画像を効果的に処理できます。

大規模事前学習

SA-1Bデータセットで事前学習されており、強力な汎化能力を持っています。

モデル能力

画像特徴抽出

画像分類

画像埋め込み生成

使用事例

コンピュータビジョン

画像分類

画像を分類し、画像内の主要な内容を識別するために使用できます。

特徴抽出

画像特徴を抽出し、下流タスクで使用するために利用できます。

🚀 samvit_base_patch16.sa1b モデルカード

このモデルは、Segment-Anything Vision Transformer (SAM ViT) の画像特徴抽出モデルです（注: 特徴抽出と微調整用で、セグメンテーションヘッドは含まれません）。論文の著者によって、MAEの重みを初期値としてSA-1Bデータセットで事前学習されています。

🚀 クイックスタート

このモデルは、画像分類や画像埋め込みのタスクに使用できます。以下に具体的な使用例を示します。

✨ 主な機能

画像分類や画像特徴抽出に適したバックボーンモデルです。
SA-1Bデータセットで事前学習されているため、セグメンテーションタスクにも活用できます。

📦 インストール

このモデルはtimmライブラリを通じて利用できます。timmのインストール方法については、公式リポジトリを参照してください。

💻 使用例

基本的な使用法

画像分類

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('samvit_base_patch16.sa1b', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

画像埋め込み

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'samvit_base_patch16.sa1b',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 256, 64, 64) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

📚 ドキュメント

モデル詳細

属性	详情
モデルタイプ	画像分類 / 特徴抽出バックボーン
パラメータ数 (M)	89.7
GMACs	486.4
アクティベーション数 (M)	1343.3
画像サイズ	1024 x 1024
関連論文	- Segment Anything: https://arxiv.org/abs/2304.02643 - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: https://arxiv.org/abs/2010.11929v2
オリジナルリポジトリ	https://github.com/facebookresearch/segment-anything
事前学習データセット	SA-1B

モデル比較

このモデルのデータセットと実行時間のメトリクスについては、timmのモデル結果を参照してください。

引用

@article{kirillov2023segany,
  title={Segment Anything},
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}

@article{dosovitskiy2020vit,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and  Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
  journal={ICLR},
  year={2021}
}

@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}