vit_giant_patch14_reg4_dinov2.lvd142m開源圖像特徵模型

Home

Vit Giant Patch14 Reg4 Dinov2.lvd142m

Developed by timm

一個帶有寄存器的視覺Transformer（ViT）圖像特徵模型，採用自監督DINOv2方法在LVD-142M數據集上預訓練。

圖像分類

Transformers

Open Source License:Apache-2.0 #自監督視覺特徵 #大尺度圖像嵌入 #寄存器增強ViT

Downloads 917

Release Time : 10/30/2023

Model Overview

該模型主要用於圖像分類和特徵提取任務，基於視覺Transformer架構，通過自監督學習在大型數據集上進行預訓練。

Model Features

寄存器增強

模型採用了寄存器技術，增強了視覺Transformer的性能和穩定性。

自監督學習

使用DINOv2自監督學習方法在LVD-142M數據集上進行預訓練。

大規模預訓練

在LVD-142M大規模數據集上預訓練，具有強大的特徵提取能力。

Model Capabilities

圖像特徵提取

圖像分類

視覺表示學習

Use Cases

計算機視覺

圖像分類

可用於對圖像進行分類，支持多種類別識別。

在多個基準數據集上表現出色

特徵提取

可作為特徵提取器用於下游視覺任務。

提取的特徵可用於目標檢測、圖像分割等任務

🚀 vit_giant_patch14_reg4_dinov2.lvd142m模型卡片

這是一個帶有寄存器的視覺變換器（ViT）圖像特徵模型，使用自監督的DINOv2方法在LVD - 142M數據集上進行了預訓練。

🚀 快速開始

本模型可用於圖像分類和提取圖像嵌入特徵，下面將為你展示具體的使用方法。

✨ 主要特性

模型類型：圖像分類/特徵主幹網絡
模型統計信息：
- 參數數量（M）：1136.5
- GMACs：1558.1
- 激活值數量（M）：874.4
- 圖像尺寸：518 x 518
相關論文：
原始代碼庫：https://github.com/facebookresearch/dinov2
預訓練數據集：LVD - 142M

📦 安裝指南

文檔中未提及安裝步驟，若需使用timm庫，可通過以下命令安裝：

pip install timm

💻 使用示例

基礎用法

圖像分類

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('vit_giant_patch14_reg4_dinov2.lvd142m', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

圖像嵌入

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'vit_giant_patch14_reg4_dinov2.lvd142m',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 1374, 1536) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

📚 詳細文檔

你可以在timm 模型結果中查看該模型的數據集和運行時指標。

📄 許可證

本項目採用Apache - 2.0許可證。

📚 引用

如果你使用了該模型，請按照以下格式引用相關論文：

@article{darcet2023vision,
  title={Vision Transformers Need Registers},
  author={Darcet, Timoth{'e}e and Oquab, Maxime and Mairal, Julien and Bojanowski, Piotr},
  journal={arXiv preprint arXiv:2309.16588},
  year={2023}
}

@misc{oquab2023dinov2,
  title={DINOv2: Learning Robust Visual Features without Supervision},
  author={Oquab, Maxime and Darcet, Timothée and Moutakanni, Theo and Vo, Huy V. and Szafraniec, Marc and Khalidov, Vasil and Fernandez, Pierre and Haziza, Daniel and Massa, Francisco and El-Nouby, Alaaeldin and Howes, Russell and Huang, Po-Yao and Xu, Hu and Sharma, Vasu and Li, Shang-Wen and Galuba, Wojciech and Rabbat, Mike and Assran, Mido and Ballas, Nicolas and Synnaeve, Gabriel and Misra, Ishan and Jegou, Herve and Mairal, Julien and Labatut, Patrick and Joulin, Armand and Bojanowski, Piotr},
  journal={arXiv:2304.07193},
  year={2023}
}

@article{dosovitskiy2020vit,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and  Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
  journal={ICLR},
  year={2021}
}

@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}