深度估計利器Depth Anything開源！基於海量圖像訓練實現零樣本深度估計

首頁

Depth Anything Base Hf

由LiheYoung開發

Depth Anything是一種基於DPT架構和DINOv2主幹網絡的深度估計模型，在約6200萬張圖像上訓練，實現了零樣本深度估計的先進性能。

3D視覺

Transformers

開源協議:Apache-2.0 #零樣本深度估計 #大規模無監督訓練 #DPT架構

下載量 4,101

發布時間 : 1/22/2024

模型概述

該模型主要用於深度估計任務，能夠從單張圖像預測深度信息，適用於計算機視覺領域的多種應用場景。

模型特點

大規模訓練數據

模型在約6200萬張圖像上進行訓練，具有強大的泛化能力。

零樣本深度估計

無需特定領域微調即可直接應用於各種場景的深度估計任務。

先進架構

結合DPT架構和DINOv2主幹網絡，實現高性能深度預測。

模型能力

單圖像深度估計

零樣本預測

計算機視覺分析

使用案例

計算機視覺

3D場景重建

從單張2D圖像預測深度信息，輔助3D場景重建

增強現實

為AR應用提供場景深度信息

🚀 深度任意模型（基礎尺寸模型，Transformers版本）

深度任意模型（Depth Anything）用於深度估計任務，它利用大規模無標籤數據，在相對和絕對深度估計方面取得了先進的成果。該模型為相關視覺任務提供了強大的支持，可應用於零樣本深度估計等場景。

🚀 快速開始

深度任意模型（Depth Anything）由Lihe Yang等人在論文深度任意模型：釋放大規模無標籤數據的力量中提出，並首次在此倉庫發佈。同時還提供了在線演示。

需注意，發佈深度任意模型的團隊未為此模型撰寫模型卡片，此卡片由Hugging Face團隊編寫。

✨ 主要特性

架構優勢：深度任意模型採用 DPT 架構，並以 DINOv2 為骨幹網絡。
訓練數據豐富：該模型在約6200萬張圖像上進行訓練，在相對和絕對深度估計方面均取得了先進的成果。

深度任意模型概述

深度任意模型概述。取自原論文。

📚 詳細文檔

預期用途和限制

你可以使用原始模型進行零樣本深度估計等任務。可查看模型中心以尋找其他感興趣的版本。

使用方法

以下是使用該模型進行零樣本深度估計的示例：

基礎用法

from transformers import pipeline
from PIL import Image
import requests

# load pipe
pipe = pipeline(task="depth-estimation", model="LiheYoung/depth-anything-base-hf")

# load image
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

# inference
depth = pipe(image)["depth"]

高級用法

from transformers import AutoImageProcessor, AutoModelForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("LiheYoung/depth-anything-base-hf")
model = AutoModelForDepthEstimation.from_pretrained("LiheYoung/depth-anything-base-hf")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

更多代碼示例請參考文檔。

BibTeX引用和引用信息

@misc{yang2024depth,
      title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, 
      author={Lihe Yang and Bingyi Kang and Zilong Huang and Xiaogang Xu and Jiashi Feng and Hengshuang Zhao},
      year={2024},
      eprint={2401.10891},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}