dpt - dinov2 - large - kittiオープンソースモデル - 深度推定に特化し、さまざまな深度測定アプリケーションを無料でサポート

ホーム

Dpt Dinov2 Large Kitti

facebookによって開発

このモデルはDINOv2をバックボーンネットワークとして採用したDPTフレームワークで、深度推定タスクに焦点を当てています。

3Dビジョン

Transformers

オープンソースライセンス:Apache-2.0 #深度推定 #教師なし学習 #視覚特徴抽出

ダウンロード数 26

リリース時間 : 11/1/2023

モデル概要

DPT（密な予測トランスフォーマー）モデルはDINOv2バックボーンネットワークを使用し、強力な深度推定機能を実現します。

モデル特徴

DINOv2バックボーンネットワーク

DINOv2をバックボーンネットワークとして採用し、ロバストな視覚特徴抽出能力を提供します。

密な予測トランスフォーマー

DPTアーキテクチャを使用して密な予測を行い、深度推定タスクに適しています。

モデル能力

深度推定

画像処理

使用事例

コンピュータビジョン

深度マップ生成

単一の画像から深度マップを生成

高精度な深度推定結果を生成

🚀 DINOv2バックボーンを持つDPTモデル

このモデルは、DPT (Dense Prediction Transformer) フレームワークとDINOv2バックボーンを組み合わせた深度推定モデルです。強力な深度推定能力を提供します。

🚀 クイックスタート

DPT (Dense Prediction Transformer) モデルは、Oquabらによる DINOv2: Learning Robust Visual Features without Supervision で提案されたDINOv2バックボーンを使用しています。

drawing

DPTアーキテクチャ。元の論文から引用。

参考資料

Transformersでの使用方法

from transformers import AutoImageProcessor, DPTForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("facebook/dpt-dinov2-large-kitti")
model = DPTForDepthEstimation.from_pretrained("facebook/dpt-dinov2-large-kitti")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

# visualize the prediction
output = prediction.squeeze().cpu().numpy()
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)

💻 使用例

基本的な使用法

from transformers import AutoImageProcessor, DPTForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("facebook/dpt-dinov2-large-kitti")
model = DPTForDepthEstimation.from_pretrained("facebook/dpt-dinov2-large-kitti")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

# visualize the prediction
output = prediction.squeeze().cpu().numpy()
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)

📚 ドキュメント

想定される使用方法

このモデルは、DPTフレームワークとDINOv2バックボーンを組み合わせることで、強力な深度推定器を実現することを示すために作成されています。

BibTeXエントリと引用情報

@misc{oquab2023dinov2,
      title={DINOv2: Learning Robust Visual Features without Supervision}, 
      author={Maxime Oquab and Timothée Darcet and Théo Moutakanni and Huy Vo and Marc Szafraniec and Vasil Khalidov and Pierre Fernandez and Daniel Haziza and Francisco Massa and Alaaeldin El-Nouby and Mahmoud Assran and Nicolas Ballas and Wojciech Galuba and Russell Howes and Po-Yao Huang and Shang-Wen Li and Ishan Misra and Michael Rabbat and Vasu Sharma and Gabriel Synnaeve and Hu Xu and Hervé Jegou and Julien Mairal and Patrick Labatut and Armand Joulin and Piotr Bojanowski},
      year={2023},
      eprint={2304.07193},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}