dpt - dinov2 - base - nyuオープンソースモデル - 無料で正確な深度推定タスクを実現

ホーム

Dpt Dinov2 Base Nyu

facebookによって開発

DINOv2をバックボーンとして使用するDPTモデルで、深度推定タスクに用いられます。

3Dビジョン

Transformers

オープンソースライセンス:Apache-2.0 #深度推定 #DINOv2バックボーン #密な予測

ダウンロード数 146

リリース時間 : 10/31/2023

モデル概要

このモデルはDINOv2の視覚的特徴抽出能力とDPTフレームワークを組み合わせ、高品質な深度推定機能を実現しています。

モデル特徴

DINOv2バックボーン

DINOv2をバックボーンとして採用し、強力な視覚的特徴抽出能力を提供します。

DPTフレームワーク

Dense Prediction Transformer（DPT）アーキテクチャを使用し、高精度な深度推定を実現します。

教師なし学習

教師なし学習に基づいて訓練され、ロバストな視覚的特徴を学習できます。

モデル能力

深度推定

画像解析

使用事例

コンピュータビジョン

シーン深度推定

画像内の各ピクセルの深度情報を推定します。

入力画像と同じサイズの深度マップを生成します。

🚀 DINOv2バックボーンを持つDPTモデル

DPT（Dense Prediction Transformer）モデルは、DINOv2をバックボーンとして使用しており、深度推定に強力な性能を発揮します。このモデルは、教師なし学習でロバストな視覚特徴を学習するDINOv2の能力を活用しています。

🚀 クイックスタート

モデルの詳細

DPT（Dense Prediction Transformer）モデルは、DINOv2をバックボーンとして使用しています。このアーキテクチャは、Oquabらによる DINOv2: Learning Robust Visual Features without Supervision で提案されました。

drawing

DPTアーキテクチャ。原論文から引用。

参考資料

Transformersを使った利用方法

from transformers import AutoImageProcessor, DPTForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("facebook/dpt-dinov2-base-nyu")
model = DPTForDepthEstimation.from_pretrained("facebook/dpt-dinov2-base-nyu")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

# visualize the prediction
output = prediction.squeeze().cpu().numpy()
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)

💻 使用例

基本的な使用法

from transformers import AutoImageProcessor, DPTForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("facebook/dpt-dinov2-base-nyu")
model = DPTForDepthEstimation.from_pretrained("facebook/dpt-dinov2-base-nyu")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

# visualize the prediction
output = prediction.squeeze().cpu().numpy()
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)

📚 ドキュメント

想定される使用法

このモデルは、DPTフレームワークにDINOv2をバックボーンとして使用することで、強力な深度推定器が得られることを示すために作成されています。

BibTeXエントリと引用情報

@misc{oquab2023dinov2,
      title={DINOv2: Learning Robust Visual Features without Supervision}, 
      author={Maxime Oquab and Timothée Darcet and Théo Moutakanni and Huy Vo and Marc Szafraniec and Vasil Khalidov and Pierre Fernandez and Daniel Haziza and Francisco Massa and Alaaeldin El-Nouby and Mahmoud Assran and Nicolas Ballas and Wojciech Galuba and Russell Howes and Po-Yao Huang and Shang-Wen Li and Ishan Misra and Michael Rabbat and Vasu Sharma and Gabriel Synnaeve and Hu Xu and Hervé Jegou and Julien Mairal and Patrick Labatut and Armand Joulin and Piotr Bojanowski},
      year={2023},
      eprint={2304.07193},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}