開源vit_large_patch14_dinov2.lvd142m圖像特徵模型，精準提取圖像關鍵特徵

首頁

Vit Large Patch14 Dinov2.lvd142m

由pcuenq開發

基於視覺Transformer（ViT）的圖像特徵模型，採用自監督DINOv2方法在LVD-142M數據集上預訓練。

圖像分類

Transformers

開源協議:Apache-2.0 #自監督視覺特徵 #大尺度圖像處理 #Transformer架構

下載量 18

發布時間 : 1/21/2025

模型概述

這是一個大型視覺Transformer模型，主要用於圖像特徵提取和圖像分類任務。模型採用DINOv2自監督學習方法在LVD-142M數據集上進行預訓練，能夠生成高質量的圖像表示。

模型特點

自監督預訓練

採用DINOv2自監督學習方法在LVD-142M數據集上預訓練，無需人工標註數據

大規模視覺Transformer

基於ViT-Large架構，具有304.4百萬參數，能夠處理高分辨率圖像

高分辨率處理能力

支持518×518像素的高分辨率圖像輸入

模型能力

圖像特徵提取

圖像分類

圖像表示學習

使用案例

計算機視覺

圖像分類

可用於各種圖像分類任務，如物體識別、場景分類等

圖像檢索

利用提取的圖像特徵進行相似圖像檢索

視覺表示學習

作為其他視覺任務的基礎模型，如目標檢測、分割等

🚀 vit_large_patch14_dinov2.lvd142m模型卡片

這是一個視覺變換器（ViT）圖像特徵模型，使用自監督的DINOv2方法在LVD - 142M數據集上進行預訓練，可用於圖像特徵提取等任務。

🚀 快速開始

本模型是一個視覺變換器（ViT）圖像特徵模型，使用自監督的DINOv2方法在LVD - 142M數據集上進行預訓練。下面為你展示如何使用該模型進行圖像分類和提取圖像嵌入。

✨ 主要特性

模型類型：圖像分類/特徵骨幹網絡
模型統計信息：
- 參數數量（M）：304.4
- GMACs：507.1
- 激活值數量（M）：1058.8
- 圖像尺寸：518 x 518
相關論文：
- DINOv2: Learning Robust Visual Features without Supervision
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
原始代碼庫：https://github.com/facebookresearch/dinov2
預訓練數據集：LVD - 142M

屬性	詳情
模型類型	圖像分類/特徵骨幹網絡
訓練數據	LVD - 142M

💻 使用示例

基礎用法

圖像分類

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('vit_large_patch14_dinov2.lvd142m', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

圖像嵌入

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'vit_large_patch14_dinov2.lvd142m',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 1370, 1024) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

📚 詳細文檔

你可以在timm 模型結果中探索該模型的數據集和運行時指標。

📄 許可證

本項目採用Apache - 2.0許可證。

📚 引用

@misc{oquab2023dinov2,
  title={DINOv2: Learning Robust Visual Features without Supervision},
  author={Oquab, Maxime and Darcet, Timothée and Moutakanni, Theo and Vo, Huy V. and Szafraniec, Marc and Khalidov, Vasil and Fernandez, Pierre and Haziza, Daniel and Massa, Francisco and El-Nouby, Alaaeldin and Howes, Russell and Huang, Po-Yao and Xu, Hu and Sharma, Vasu and Li, Shang-Wen and Galuba, Wojciech and Rabbat, Mike and Assran, Mido and Ballas, Nicolas and Synnaeve, Gabriel and Misra, Ishan and Jegou, Herve and Mairal, Julien and Labatut, Patrick and Joulin, Armand and Bojanowski, Piotr},
  journal={arXiv:2304.07193},
  year={2023}
}

@article{dosovitskiy2020vit,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and  Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
  journal={ICLR},
  year={2021}
}

@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}