dpt-dinov2-giant-nyuオープンソースモデル - 無料で単眼深度推定タスクを実現

Home

Dpt Dinov2 Giant Nyu

Developed by facebook

DINOv2をバックボーンネットワークとして使用するDPTモデルで、単眼深度推定タスクに用いられる

3Dビジョン

Transformers

Open Source License:Apache-2.0 #教師なし深度推定 #DINOv2バックボーンネットワーク #高精度深度マップ

Downloads 29

Release Time : 11/1/2023

Model Overview

このモデルはDINOv2の視覚的特徴抽出能力とDPTの密な予測アーキテクチャを組み合わせており、単一の画像から深度情報を予測できる

Model Features

DINOv2バックボーンネットワーク

教師なし事前学習済みのDINOv2を特徴抽出器として採用し、強力な視覚的表現能力を提供

密な予測アーキテクチャ

DPTアーキテクチャに基づき、高解像度の密な深度予測マップを生成可能

高精度深度推定

ステレオ視覚入力を必要とせず、単一のRGB画像からシーンの深度情報を予測可能

Model Capabilities

単眼深度推定

画像深度マップ生成

シーン理解

Use Cases

コンピュータビジョン

3Dシーン再構築

単一画像からシーン深度を推定し、3Dシーン再構築を支援

拡張現実

ARアプリケーションにシーン深度情報を提供し、より現実的な仮想物体配置を実現

ロボットビジョン

自律ナビゲーション

ロボットに環境深度情報を提供し、経路計画と障害物回避を支援

🚀 モデルカード: DINOv2バックボーンを持つDPTモデル

このモデルは、DINOv2バックボーンを持つDPT（Dense Prediction Transformer）モデルで、深度推定に強力な性能を発揮します。

🚀 クイックスタート

DPT (Dense Prediction Transformer) モデルは、Oquabらによる DINOv2: Learning Robust Visual Features without Supervision で提案されたDINOv2バックボーンを使用しています。

drawing

DPTアーキテクチャ。元の論文より引用。

参考資料

Transformersでの使用方法

from transformers import AutoImageProcessor, DPTForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("facebook/dpt-dinov2-giant-nyu")
model = DPTForDepthEstimation.from_pretrained("facebook/dpt-dinov2-giant-nyu")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

# visualize the prediction
output = prediction.squeeze().cpu().numpy()
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)

📚 ドキュメント

想定される用途

このモデルは、DPTフレームワークにDINOv2をバックボーンとして使用することで、強力な深度推定器を実現することを示すために作成されています。

BibTeXエントリと引用情報

@misc{oquab2023dinov2,
      title={DINOv2: Learning Robust Visual Features without Supervision}, 
      author={Maxime Oquab and Timothée Darcet and Théo Moutakanni and Huy Vo and Marc Szafraniec and Vasil Khalidov and Pierre Fernandez and Daniel Haziza and Francisco Massa and Alaaeldin El-Nouby and Mahmoud Assran and Nicolas Ballas and Wojciech Galuba and Russell Howes and Po-Yao Huang and Shang-Wen Li and Ishan Misra and Michael Rabbat and Vasu Sharma and Gabriel Synnaeve and Hu Xu and Hervé Jegou and Julien Mairal and Patrick Labatut and Armand Joulin and Piotr Bojanowski},
      year={2023},
      eprint={2304.07193},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}