dpt-dinov2-large-nyuオープンソースモデル - 無料デプロイで高精度な深度推定タスクをサポート

ホーム

Dpt Dinov2 Large Nyu

facebookによって開発

このモデルはDINOv2バックボーンネットワークを使用して構築されたDPTフレームワークで、深度推定タスクに使用されます。

3Dビジョン

Transformers

オープンソースライセンス:Apache-2.0 #深度推定 #教師なし視覚特徴 #密な予測トランスフォーマー

ダウンロード数 80

リリース時間 : 10/31/2023

モデル概要

DPT（密な予測トランスフォーマー）モデルはDINOv2バックボーンネットワークと組み合わせることで、高品質な深度推定が可能です。

モデル特徴

DINOv2バックボーンネットワーク

教師なし学習で得られたロバストな視覚特徴を使用し、モデルの性能を向上させます。

密な予測トランスフォーマー

DPTフレームワークを利用して密な予測を行い、深度推定タスクに適しています。

高品質な深度推定

高精度な深度マップを生成でき、様々な視覚シーンに適用可能です。

モデル能力

画像深度推定

視覚特徴抽出

使用事例

コンピュータビジョン

シーン深度推定

入力画像に対して深度推定を行い、対応する深度マップを生成します。

高品質な深度マップを生成し、3D再構築や拡張現実などのアプリケーションに使用できます。

🚀 モデルカード: DINOv2バックボーンを持つDPTモデル

このモデルは、DPT (Dense Prediction Transformer) フレームワークにDINOv2をバックボーンとして使用することで、強力な深度推定器を実現します。

🚀 クイックスタート

モデルの詳細

DPT (Dense Prediction Transformer) モデルは、Oquabらによる DINOv2: Learning Robust Visual Features without Supervision で提案されたDINOv2をバックボーンとしています。

drawing

DPTアーキテクチャ。元の論文より引用。

参考資料

Transformersでの使用方法

from transformers import AutoImageProcessor, DPTForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("facebook/dpt-dinov2-large-nyu")
model = DPTForDepthEstimation.from_pretrained("facebook/dpt-dinov2-large-nyu")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

# visualize the prediction
output = prediction.squeeze().cpu().numpy()
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)

💻 使用例

基本的な使用法

from transformers import AutoImageProcessor, DPTForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("facebook/dpt-dinov2-large-nyu")
model = DPTForDepthEstimation.from_pretrained("facebook/dpt-dinov2-large-nyu")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

# visualize the prediction
output = prediction.squeeze().cpu().numpy()
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)

📚 ドキュメント

想定される使用方法

このモデルは、DPTフレームワークにDINOv2をバックボーンとして使用することで、強力な深度推定器を提供することを目的としています。

BibTeXエントリと引用情報

@misc{oquab2023dinov2,
      title={DINOv2: Learning Robust Visual Features without Supervision}, 
      author={Maxime Oquab and Timothée Darcet and Théo Moutakanni and Huy Vo and Marc Szafraniec and Vasil Khalidov and Pierre Fernandez and Daniel Haziza and Francisco Massa and Alaaeldin El-Nouby and Mahmoud Assran and Nicolas Ballas and Wojciech Galuba and Russell Howes and Po-Yao Huang and Shang-Wen Li and Ishan Misra and Michael Rabbat and Vasu Sharma and Gabriel Synnaeve and Hu Xu and Hervé Jegou and Julien Mairal and Patrick Labatut and Armand Joulin and Piotr Bojanowski},
      year={2023},
      eprint={2304.07193},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}