samvit_base_patch16.sa1b開源圖像特徵模型 - 免費實現特徵提取與微調

首頁

Samvit Base Patch16.sa1b

由timm開發

Segment-Anything視覺變換器（SAM ViT）圖像特徵模型，僅包含特徵提取和微調功能，不包含分割頭。

圖像分割

Transformers

開源協議:Apache-2.0 #圖像分割特徵提取 #SA-1B預訓練 #ViT骨幹網絡

下載量 2,756

發布時間 : 5/18/2023

模型概述

該模型是一個基於視覺變換器（ViT）架構的圖像特徵提取模型，主要用於圖像分類和特徵提取任務。它由論文作者在SA-1B數據集上通過MAE權重初始化進行預訓練，適用於分割任務。

模型特點

高效的特徵提取

該模型專注於圖像特徵提取，適用於各種下游視覺任務。

基於視覺變換器架構

採用先進的視覺變換器（ViT）架構，能夠有效處理高分辨率圖像。

大規模預訓練

在SA-1B數據集上進行預訓練，具有強大的泛化能力。

模型能力

圖像特徵提取

圖像分類

圖像嵌入生成

使用案例

計算機視覺

圖像分類

可用於對圖像進行分類，識別圖像中的主要內容。

特徵提取

可用於提取圖像特徵，供下游任務使用。

🚀 samvit_base_patch16.sa1b模型卡片

這是一個Segment-Anything Vision Transformer (SAM ViT)圖像特徵模型（注意：用於提取特徵和微調，不包含分割頭）。該模型由論文作者在SA - 1B數據集上進行預訓練以用於分割任務，並使用MAE的權重進行初始化。

🚀 快速開始

本模型是一個基於Transformer架構的圖像特徵提取模型，可用於圖像分類和獲取圖像嵌入等任務。

✨ 主要特性

模型類型：圖像分類/特徵骨幹網絡
模型統計信息：
- 參數數量（M）：89.7
- GMACs：486.4
- 激活值數量（M）：1343.3
- 圖像尺寸：1024 x 1024
相關論文：
- Segment Anything
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
原始代碼庫：https://github.com/facebookresearch/segment - anything
預訓練數據集：SA - 1B

📦 安裝指南

文檔未提及安裝步驟，若需使用該模型，可參考timm庫的官方安裝說明。

💻 使用示例

基礎用法

圖像分類

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('samvit_base_patch16.sa1b', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

圖像嵌入

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'samvit_base_patch16.sa1b',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 256, 64, 64) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

📚 詳細文檔

可在timm 模型結果中探索該模型的數據集和運行時指標。

📄 許可證

本項目採用Apache - 2.0許可證。

📚 引用

@article{kirillov2023segany,
  title={Segment Anything},
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}

@article{dosovitskiy2020vit,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and  Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
  journal={ICLR},
  year={2021}
}

@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}