webssl-dino1b-full2b-224開源視覺模型 - 免語言監督學習視覺表示

首頁

Webssl Dino1b Full2b 224

由facebook開發

這是一個通過DINOv2自監督學習在20億網絡圖像上訓練的10億參數視覺Transformer模型，無需語言監督即可學習視覺表示。

圖像分類

Transformers

#自監督視覺學習 #億級參數規模 #無語言監督

下載量 1,172

發布時間 : 4/25/2025

模型概述

該模型證明純視覺學習在規模適當時可以達到或超過語言監督模型的性能，適用於各種視覺任務。

模型特點

大規模自監督學習

基於20億網絡圖像進行訓練，無需語言監督

高性能視覺表示

在多種視覺任務上達到或超過語言監督模型的性能

高效架構設計

採用ViT架構，寬度1536，深度40，24個頭

模型能力

圖像特徵提取

視覺表示學習

圖像分類

目標檢測

使用案例

計算機視覺

圖像分類

使用模型提取的圖像特徵進行分類任務

目標檢測

利用模型學習到的視覺表示進行目標檢測

🚀 Web-SSL DINO ViT-1B: 2B MetaCLIP數據，224分辨率

本項目是一個具有10億參數的視覺變換器（ViT）模型，它在無語言監督的情況下，利用DINOv2自監督學習方法在網絡規模的圖像數據上進行訓練。該模型出自論文"Scaling Language-Free Visual Representation Learning"（Fan等人，2025年）。

✨ 主要特性

採用無語言監督的自監督學習方式，在大規模網絡圖像數據上進行訓練。
純視覺學習在適當擴展規模後，在各種視覺任務中可媲美甚至超越像CLIP這樣的語言監督模型的性能。

📦 安裝指南

文檔未提及安裝步驟，跳過此章節。

💻 使用示例

基礎用法

from transformers import AutoImageProcessor, Dinov2Model
import torch
from PIL import Image

processor = AutoImageProcessor.from_pretrained('facebook/webssl-dino1b-full2b-224')
# 'eager' and 'sdpa' attn_implementation supported
model = Dinov2Model.from_pretrained('facebook/webssl-dino1b-full2b-224')

# Process an image
image = Image.open('path/to/image.jpg')
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

cls_features = outputs.last_hidden_state[:, 0]  # CLS token features
patch_features = outputs.last_hidden_state[:, 1:] # patch-wise token features

高級用法

文檔未提及高級用法代碼示例，跳過此部分。

📚 詳細文檔

模型詳情

屬性	詳情
架構	ViT（寬度1536，深度40，24頭）
參數數量	10億
分辨率	224×224像素
訓練方式	在來自MetaCLIP網絡數據的20億個圖像樣本上進行自監督Web - DINO訓練

模型描述

Web - SSL DINO 1B是一個具有10億參數的視覺變換器模型，它在無語言監督的情況下，使用自監督學習方法在20億張網絡圖像上進行訓練。該模型表明，純視覺學習在適當擴展規模後，在各種視覺任務中可以達到或超過像CLIP這樣的語言監督模型的性能。

WebSSL模型概述

📄 許可證

本項目採用CC - BY - NC - 4.0許可證。

📚 引用

@article{fan2025scaling,
  title={Scaling Language-Free Visual Representation Learning}, 
  author={David Fan and Shengbang Tong and Jiachen Zhu and Koustuv Sinha and Zhuang Liu and Xinlei Chen and Michael Rabbat and Nicolas Ballas and Yann LeCun and Amir Bar and Saining Xie},
  year={2025},
  eprint={2504.01017},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}