🚀 samvit_large_patch16.sa1b模型卡片
這是一個Segment-Anything Vision Transformer(SAM ViT)圖像特徵模型(注意:用於特徵提取和微調,不包含分割頭)。由論文作者使用MAE權重初始化,在SA-1B數據集上進行分割預訓練。
🚀 快速開始
本模型是一個基於Transformer架構的圖像特徵模型,可用於圖像分類和特徵提取。下面將介紹如何使用該模型進行圖像分類和獲取圖像嵌入。
✨ 主要特性
- 模型類型:圖像分類/特徵骨幹網絡
- 模型統計信息:
- 參數數量(百萬):308.3
- GMACs:1493.9
- 激活值數量(百萬):2553.8
- 圖像尺寸:1024 x 1024
- 相關論文:
- Segment Anything: https://arxiv.org/abs/2304.02643
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: https://arxiv.org/abs/2010.11929v2
- 原始代碼庫:https://github.com/facebookresearch/segment-anything
- 預訓練數據集:SA-1B
屬性 |
詳情 |
模型類型 |
圖像分類/特徵骨幹網絡 |
預訓練數據集 |
SA-1B |
📦 安裝指南
文檔中未提及安裝步驟,若有需要可參考timm
庫的官方安裝說明。
💻 使用示例
基礎用法
圖像分類
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model('samvit_large_patch16.sa1b', pretrained=True)
model = model.eval()
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0))
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
圖像嵌入
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model(
'samvit_large_patch16.sa1b',
pretrained=True,
num_classes=0,
)
model = model.eval()
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0))
output = model.forward_features(transforms(img).unsqueeze(0))
output = model.forward_head(output, pre_logits=True)
📚 詳細文檔
你可以在timm 模型結果中探索該模型的數據集和運行時指標。
📄 許可證
本項目採用Apache-2.0許可證。
📚 引用
@article{kirillov2023segany,
title={Segment Anything},
author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{'a}r, Piotr and Girshick, Ross},
journal={arXiv:2304.02643},
year={2023}
}
@article{dosovitskiy2020vit,
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
journal={ICLR},
year={2021}
}
@misc{rw2019timm,
author = {Ross Wightman},
title = {PyTorch Image Models},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
doi = {10.5281/zenodo.4414861},
howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}