🚀 vit_huge_patch14_224.mae模型卡片
這是一個視覺變換器(ViT)圖像特徵模型,使用自監督掩碼自編碼器(MAE)方法在ImageNet - 1k上進行了預訓練,可用於圖像特徵提取等任務。
🚀 快速開始
本模型是基於視覺變換器(ViT)架構的圖像特徵模型,使用自監督掩碼自編碼器(MAE)方法在ImageNet - 1k數據集上進行預訓練。以下是使用示例:
💻 使用示例
基礎用法
圖像分類
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model('vit_huge_patch14_224.mae', pretrained=True)
model = model.eval()
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0))
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
圖像嵌入
from urllib.request import urlopen
from PIL import Image
import timm
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
model = timm.create_model(
'vit_huge_patch14_224.mae',
pretrained=True,
num_classes=0,
)
model = model.eval()
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0))
output = model.forward_features(transforms(img).unsqueeze(0))
output = model.forward_head(output, pre_logits=True)
📚 詳細文檔
模型詳情
屬性 |
詳情 |
模型類型 |
圖像分類 / 特徵骨幹網絡 |
模型統計信息 |
參數數量(M):630.8 GMACs:167.4 激活值數量(M):139.4 圖像尺寸:224 x 224 |
相關論文 |
Masked Autoencoders Are Scalable Vision Learners: https://arxiv.org/abs/2111.06377 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: https://arxiv.org/abs/2010.11929v2 |
預訓練數據集 |
ImageNet - 1k |
原始代碼庫 |
https://github.com/facebookresearch/mae |
模型比較
你可以在timm 模型結果 中探索該模型的數據集和運行時指標。
引用
@Article{MaskedAutoencoders2021,
author = {Kaiming He and Xinlei Chen and Saining Xie and Yanghao Li and Piotr Doll{'a}r and Ross Girshick},
journal = {arXiv:2111.06377},
title = {Masked Autoencoders Are Scalable Vision Learners},
year = {2021},
}
@article{dosovitskiy2020vit,
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
journal={ICLR},
year={2021}
}
@misc{rw2019timm,
author = {Ross Wightman},
title = {PyTorch Image Models},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
doi = {10.5281/zenodo.4414861},
howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}
📄 許可證
本模型使用CC - BY - NC - 4.0許可證。