vit_giant_patch14_dinov2.lvd142m開源圖像特徵提取模型

首頁

Vit Giant Patch14 Dinov2.lvd142m

由timm開發

基於視覺Transformer(ViT)的巨型圖像特徵提取模型，採用自監督DINOv2方法在LVD-142M數據集上預訓練

圖像分類

Transformers

開源協議:Apache-2.0 #自監督視覺特徵 #大尺寸圖像處理 #DINOv2預訓練

下載量 6,911

發布時間 : 5/9/2023

模型概述

這是一個視覺Transformer架構的巨型模型，專門用於圖像特徵提取和圖像分類任務。模型採用DINOv2自監督學習方法在大型數據集上預訓練，能夠生成高質量的圖像表示。

模型特點

自監督預訓練

採用DINOv2自監督學習方法在LVD-142M數據集上預訓練，無需人工標註數據

巨型模型架構

基於ViT-Giant架構，具有1136.5百萬參數，能夠捕獲更豐富的圖像特徵

高分辨率處理

支持518×518像素的高分辨率圖像輸入，適合處理細節豐富的視覺內容

多功能輸出

既可輸出分類結果，也可輸出原始圖像特徵嵌入，適用於多種下游任務

模型能力

圖像特徵提取

圖像分類

生成圖像嵌入表示

視覺內容理解

使用案例

計算機視覺

圖像分類

對輸入圖像進行分類，輸出最可能的類別

在多種視覺基準測試中表現優異

特徵提取

提取圖像的深度特徵表示，用於下游任務

生成的高質量特徵可用於檢索、匹配等任務

內容理解

視覺內容分析

分析圖像內容，理解其中的視覺元素和場景

能夠捕獲圖像中的高級語義信息

🚀 vit_giant_patch14_dinov2.lvd142m 模型卡片

這是一個視覺變換器（ViT）圖像特徵模型，使用自監督的 DINOv2 方法在 LVD - 142M 數據集上進行預訓練，可用於圖像特徵提取等任務。

🚀 快速開始

本模型可用於圖像分類和圖像嵌入提取，以下是具體使用示例。

💻 使用示例

基礎用法

圖像分類

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('vit_giant_patch14_dinov2.lvd142m', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

圖像嵌入

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'vit_giant_patch14_dinov2.lvd142m',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 1370, 1536) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

📚 詳細文檔

模型詳情

屬性	詳情
模型類型	圖像分類 / 特徵骨幹網絡
模型參數（M）	1136.5
GMACs	1784.2
激活值（M）	2757.9
圖像尺寸	518 x 518
相關論文	- DINOv2: Learning Robust Visual Features without Supervision - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
原始代碼庫	https://github.com/facebookresearch/dinov2
預訓練數據集	LVD - 142M

模型比較

你可以在 timm 模型結果中探索該模型的數據集和運行時指標。

📄 許可證

本項目採用 Apache - 2.0 許可證。

📖 引用

@misc{oquab2023dinov2,
  title={DINOv2: Learning Robust Visual Features without Supervision},
  author={Oquab, Maxime and Darcet, Timothée and Moutakanni, Theo and Vo, Huy V. and Szafraniec, Marc and Khalidov, Vasil and Fernandez, Pierre and Haziza, Daniel and Massa, Francisco and El-Nouby, Alaaeldin and Howes, Russell and Huang, Po-Yao and Xu, Hu and Sharma, Vasu and Li, Shang-Wen and Galuba, Wojciech and Rabbat, Mike and Assran, Mido and Ballas, Nicolas and Synnaeve, Gabriel and Misra, Ishan and Jegou, Herve and Mairal, Julien and Labatut, Patrick and Joulin, Armand and Bojanowski, Piotr},
  journal={arXiv:2304.07193},
  year={2023}
}

@article{dosovitskiy2020vit,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and  Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
  journal={ICLR},
  year={2021}
}

@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}