vit_base_patch14_reg4_dinov2.lvd142m開源圖像特徵模型 - 免費預訓練精準提取圖像特徵

首頁

Vit Base Patch14 Reg4 Dinov2.lvd142m

由timm開發

一個帶有寄存器的視覺變換器（ViT）圖像特徵模型，使用自監督的DINOv2方法在LVD-142M數據集上進行預訓練。

圖像分類

Transformers

開源協議:Apache-2.0 #自監督視覺特徵 #寄存器增強ViT #大尺度圖像處理

下載量 40.95k

發布時間 : 10/30/2023

模型概述

該模型是基於視覺變換器（ViT）架構的圖像特徵提取骨幹網絡，特別添加了寄存器機制以提升性能。主要用於圖像分類和特徵提取任務。

模型特點

寄存器增強

模型採用了寄存器機制，提升了視覺變換器的性能表現

自監督預訓練

使用DINOv2自監督學習方法在LVD-142M數據集上進行預訓練

大尺寸輸入支持

支持518×518像素的大尺寸圖像輸入

模型能力

圖像特徵提取

圖像分類

生成圖像嵌入表示

使用案例

計算機視覺

圖像分類

可用於通用圖像分類任務

特徵提取

可作為骨幹網絡為下游視覺任務提供特徵表示

🚀 vit_base_patch14_reg4_dinov2.lvd142m模型卡

這是一個帶有寄存器的視覺變換器（ViT）圖像特徵模型，使用自監督的DINOv2方法在LVD - 142M數據集上進行預訓練。

🚀 快速開始

本模型可用於圖像分類和提取圖像嵌入特徵。以下是使用示例。

✨ 主要特性

模型類型：圖像分類/特徵骨幹網絡
模型統計信息：
- 參數數量（M）：86.6
- GMACs：117.5
- 激活值（M）：115.0
- 圖像尺寸：518 x 518
相關論文：
原始代碼庫：https://github.com/facebookresearch/dinov2
預訓練數據集：LVD - 142M

📦 安裝指南

文檔未提及安裝步驟，此處跳過。

💻 使用示例

基礎用法

圖像分類

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('vit_base_patch14_reg4_dinov2.lvd142m', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

圖像嵌入

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'vit_base_patch14_reg4_dinov2.lvd142m',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 1374, 768) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

📚 詳細文檔

可在timm的模型結果中查看本模型的數據集和運行時指標。

🔧 技術細節

文檔未提供具體技術實現細節，此處跳過。

📄 許可證

本模型使用Apache - 2.0許可證。

📚 引用

@article{darcet2023vision,
  title={Vision Transformers Need Registers},
  author={Darcet, Timoth{'e}e and Oquab, Maxime and Mairal, Julien and Bojanowski, Piotr},
  journal={arXiv preprint arXiv:2309.16588},
  year={2023}
}

@misc{oquab2023dinov2,
  title={DINOv2: Learning Robust Visual Features without Supervision},
  author={Oquab, Maxime and Darcet, Timothée and Moutakanni, Theo and Vo, Huy V. and Szafraniec, Marc and Khalidov, Vasil and Fernandez, Pierre and Haziza, Daniel and Massa, Francisco and El-Nouby, Alaaeldin and Howes, Russell and Huang, Po-Yao and Xu, Hu and Sharma, Vasu and Li, Shang-Wen and Galuba, Wojciech and Rabbat, Mike and Assran, Mido and Ballas, Nicolas and Synnaeve, Gabriel and Misra, Ishan and Jegou, Herve and Mairal, Julien and Labatut, Patrick and Joulin, Armand and Bojanowski, Piotr},
  journal={arXiv:2304.07193},
  year={2023}
}

@article{dosovitskiy2020vit,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and  Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
  journal={ICLR},
  year={2021}
}

@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}