vit_giant_patch14_dinov2.lvd142m开源图像特征提取模型

首页

Vit Giant Patch14 Dinov2.lvd142m

由 timm 开发

基于视觉Transformer(ViT)的巨型图像特征提取模型，采用自监督DINOv2方法在LVD-142M数据集上预训练

图像分类

Transformers

开源协议:Apache-2.0 #自监督视觉特征 #大尺寸图像处理 #DINOv2预训练

下载量 6,911

发布时间 : 5/9/2023

模型简介

这是一个视觉Transformer架构的巨型模型，专门用于图像特征提取和图像分类任务。模型采用DINOv2自监督学习方法在大型数据集上预训练，能够生成高质量的图像表示。

模型特点

自监督预训练

采用DINOv2自监督学习方法在LVD-142M数据集上预训练，无需人工标注数据

巨型模型架构

基于ViT-Giant架构，具有1136.5百万参数，能够捕获更丰富的图像特征

高分辨率处理

支持518×518像素的高分辨率图像输入，适合处理细节丰富的视觉内容

多功能输出

既可输出分类结果，也可输出原始图像特征嵌入，适用于多种下游任务

模型能力

图像特征提取

图像分类

生成图像嵌入表示

视觉内容理解

使用案例

计算机视觉

图像分类

对输入图像进行分类，输出最可能的类别

在多种视觉基准测试中表现优异

特征提取

提取图像的深度特征表示，用于下游任务

生成的高质量特征可用于检索、匹配等任务

内容理解

视觉内容分析

分析图像内容，理解其中的视觉元素和场景

能够捕获图像中的高级语义信息

🚀 vit_giant_patch14_dinov2.lvd142m 模型卡片

这是一个视觉变换器（ViT）图像特征模型，使用自监督的 DINOv2 方法在 LVD - 142M 数据集上进行预训练，可用于图像特征提取等任务。

🚀 快速开始

本模型可用于图像分类和图像嵌入提取，以下是具体使用示例。

💻 使用示例

基础用法

图像分类

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('vit_giant_patch14_dinov2.lvd142m', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

图像嵌入

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'vit_giant_patch14_dinov2.lvd142m',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 1370, 1536) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

📚 详细文档

模型详情

属性	详情
模型类型	图像分类 / 特征骨干网络
模型参数（M）	1136.5
GMACs	1784.2
激活值（M）	2757.9
图像尺寸	518 x 518
相关论文	- DINOv2: Learning Robust Visual Features without Supervision - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
原始代码库	https://github.com/facebookresearch/dinov2
预训练数据集	LVD - 142M

模型比较

你可以在 timm 模型结果中探索该模型的数据集和运行时指标。

📄 许可证

本项目采用 Apache - 2.0 许可证。

📖 引用

@misc{oquab2023dinov2,
  title={DINOv2: Learning Robust Visual Features without Supervision},
  author={Oquab, Maxime and Darcet, Timothée and Moutakanni, Theo and Vo, Huy V. and Szafraniec, Marc and Khalidov, Vasil and Fernandez, Pierre and Haziza, Daniel and Massa, Francisco and El-Nouby, Alaaeldin and Howes, Russell and Huang, Po-Yao and Xu, Hu and Sharma, Vasu and Li, Shang-Wen and Galuba, Wojciech and Rabbat, Mike and Assran, Mido and Ballas, Nicolas and Synnaeve, Gabriel and Misra, Ishan and Jegou, Herve and Mairal, Julien and Labatut, Patrick and Joulin, Armand and Bojanowski, Piotr},
  journal={arXiv:2304.07193},
  year={2023}
}

@article{dosovitskiy2020vit,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and  Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
  journal={ICLR},
  year={2021}
}

@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}