dpt-dinov2-base-kittiオープンソースモデル - 高精度深度推定タスクに無料で使用可能

ホーム

Dpt Dinov2 Base Kitti

facebookによって開発

DINOv2をバックボーンとして使用するDPTフレームワーク、深度推定タスク用

3Dビジョン

Transformers

オープンソースライセンス:Apache-2.0 #深度推定 #DINOv2バックボーン #密な予測

ダウンロード数 446

リリース時間 : 10/31/2023

モデル概要

このモデルはDINOv2バックボーンに基づくDPT（密な予測トランスフォーマー）モデルで、主に深度推定タスクに使用されます。DINOv2がバックボーンとして深度推定において強力な能力を発揮することを示しています。

モデル特徴

DINOv2バックボーン

DINOv2をバックボーンとして使用し、強力な視覚的特徴抽出能力を提供

深度推定

単一画像からの深度推定タスクに特化

密な予測

密な予測トランスフォーマーアーキテクチャを採用し、高精度な深度マップを生成可能

モデル能力

単眼深度推定

画像深度分析

使用事例

コンピュータビジョン

3Dシーン再構築

単一画像から深度情報を推定し、3Dシーン再構築に使用

高精度な深度マップを生成

自動運転

自動運転システムにおける環境認識に使用

シーンの深度情報を提供

🚀 DINOv2バックボーンを持つDPTモデル

このモデルは、DINOv2をバックボーンとするDPT（Dense Prediction Transformer）モデルで、深度推定タスクに強力な性能を発揮します。

🚀 クイックスタート

このセクションでは、モデルの基本的な使い方を説明します。

✨ 主な機能

DINOv2をバックボーンとして使用することで、教師なし学習により強力な視覚特徴を学習します。
DPTフレームワークを用いて、高精度な深度推定が可能です。

📚 ドキュメント

モデルの詳細

DPT（Dense Prediction Transformer）モデルは、DINOv2をバックボーンとして使用しています。このモデルは、Oquabらによる DINOv2: Learning Robust Visual Features without Supervision で提案されました。

drawing

DPTアーキテクチャ。元論文より引用。

Transformersでの使用方法

from transformers import AutoImageProcessor, DPTForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("facebook/dpt-dinov2-base-kitti")
model = DPTForDepthEstimation.from_pretrained("facebook/dpt-dinov2-base-kitti")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

# visualize the prediction
output = prediction.squeeze().cpu().numpy()
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)

🔧 技術詳細

想定される用途

このモデルは、DPTフレームワークにDINOv2をバックボーンとして使用することで、強力な深度推定器を実現することを目的としています。

BibTeXエントリと引用情報

@misc{oquab2023dinov2,
      title={DINOv2: Learning Robust Visual Features without Supervision}, 
      author={Maxime Oquab and Timothée Darcet and Théo Moutakanni and Huy Vo and Marc Szafraniec and Vasil Khalidov and Pierre Fernandez and Daniel Haziza and Francisco Massa and Alaaeldin El-Nouby and Mahmoud Assran and Nicolas Ballas and Wojciech Galuba and Russell Howes and Po-Yao Huang and Shang-Wen Li and Ishan Misra and Michael Rabbat and Vasu Sharma and Gabriel Synnaeve and Hu Xu and Hervé Jegou and Julien Mairal and Patrick Labatut and Armand Joulin and Piotr Bojanowski},
      year={2023},
      eprint={2304.07193},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}