DPT - DINOv2 Base KITTI Open-Source Model - Free for Accurate Depth Estimation Tasks

Home

Dpt Dinov2 Base Kitti

Developed by facebook

DPT framework using DINOv2 as backbone for depth estimation tasks

3D Vision

Transformers

Open Source License:Apache-2.0 #Depth Estimation #DINOv2 Backbone #Dense Prediction

Downloads 446

Release Time : 10/31/2023

Model Overview

This model is a DPT (Dense Prediction Transformer) based on the DINOv2 backbone, primarily used for depth estimation tasks. It demonstrates the powerful capabilities of DINOv2 as a backbone network in depth estimation.

Model Features

DINOv2 Backbone

Utilizes DINOv2 as the backbone network, providing powerful visual feature extraction capabilities

Depth Estimation

Specifically designed for depth estimation tasks from single images

Dense Prediction

Adopts the dense prediction transformer architecture, capable of generating high-precision depth maps

Model Capabilities

Monocular Depth Estimation

Image Depth Analysis

Use Cases

Computer Vision

3D Scene Reconstruction

Estimates depth information from a single image for 3D scene reconstruction

Generates high-precision depth maps

Autonomous Driving

Used for environmental perception in autonomous driving systems

Provides scene depth information

🚀 Model Card: DPT model with DINOv2 backbone

This model showcases using the DPT framework with DINOv2 as backbone to build a powerful depth estimator.

🚀 Quick Start

The following code demonstrates how to use the model for depth estimation:

from transformers import AutoImageProcessor, DPTForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("facebook/dpt-dinov2-base-kitti")
model = DPTForDepthEstimation.from_pretrained("facebook/dpt-dinov2-base-kitti")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

# visualize the prediction
output = prediction.squeeze().cpu().numpy()
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)

✨ Features

The DPT (Dense Prediction Transformer) model with DINOv2 backbone, as proposed in DINOv2: Learning Robust Visual Features without Supervision by Oquab et al., is a powerful tool for depth estimation.

DPT architecture

DPT architecture. Taken from the original paper.

Resources

💻 Usage Examples

Basic Usage

from transformers import AutoImageProcessor, DPTForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("facebook/dpt-dinov2-base-kitti")
model = DPTForDepthEstimation.from_pretrained("facebook/dpt-dinov2-base-kitti")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

# visualize the prediction
output = prediction.squeeze().cpu().numpy()
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)

📚 Documentation

Intended Use

The model is intended to showcase that using the DPT framework with DINOv2 as backbone yields a powerful depth estimator.

BibTeX entry and citation info

@misc{oquab2023dinov2,
      title={DINOv2: Learning Robust Visual Features without Supervision}, 
      author={Maxime Oquab and Timothée Darcet and Théo Moutakanni and Huy Vo and Marc Szafraniec and Vasil Khalidov and Pierre Fernandez and Daniel Haziza and Francisco Massa and Alaaeldin El-Nouby and Mahmoud Assran and Nicolas Ballas and Wojciech Galuba and Russell Howes and Po-Yao Huang and Shang-Wen Li and Ishan Misra and Michael Rabbat and Vasu Sharma and Gabriel Synnaeve and Hu Xu and Hervé Jegou and Julien Mairal and Patrick Labatut and Armand Joulin and Piotr Bojanowski},
      year={2023},
      eprint={2304.07193},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

📄 License

The model is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご