The open-source model DPT-DiNove2-Giant-KITTI - A practical tool for depth estimation tasks

Home

Dpt Dinov2 Giant Kitti

Developed by facebook

DPT framework using DINOv2 as the backbone network for depth estimation tasks.

3D Vision

Transformers

Open Source License:Apache-2.0 #Depth Estimation #Unsupervised Learning #Visual Feature Extraction

Downloads 56

Release Time : 11/1/2023

Model Overview

This model combines DINOv2's unsupervised visual feature learning capability with DPT's dense prediction transformer architecture, focusing on depth estimation tasks.

Model Features

DINOv2 Backbone Network

Uses DINOv2 with unsupervised learning as the backbone network, providing powerful visual feature extraction capabilities.

Dense Prediction Transformer

Uses DPT architecture for dense prediction tasks, particularly suitable for depth estimation.

High-Precision Depth Estimation

Capable of generating high-quality depth maps from single images.

Model Capabilities

Single-Image Depth Estimation

Visual Feature Extraction

Use Cases

Computer Vision

3D Scene Reconstruction

Estimates depth information from 2D images for 3D scene reconstruction.

Generates accurate depth maps

Autonomous Driving

Used for environmental perception and distance estimation in autonomous driving systems.

🚀 Model Card: DPT model with DINOv2 backbone

This model combines the DPT framework with a DINOv2 backbone to create a powerful depth estimator.

🚀 Quick Start

To use the model with the Transformers library, you can follow the example code below:

from transformers import AutoImageProcessor, DPTForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("facebook/dpt-dinov2-giant-kitti")
model = DPTForDepthEstimation.from_pretrained("facebook/dpt-dinov2-giant-kitti")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

# visualize the prediction
output = prediction.squeeze().cpu().numpy()
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)

✨ Features

The DPT (Dense Prediction Transformer) model with DINOv2 backbone, as proposed in DINOv2: Learning Robust Visual Features without Supervision by Oquab et al., offers a powerful solution for depth estimation.

DPT architecture

DPT architecture. Taken from the original paper.

Resources

💻 Usage Examples

Basic Usage

The following code demonstrates how to use the model for depth estimation:

from transformers import AutoImageProcessor, DPTForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("facebook/dpt-dinov2-giant-kitti")
model = DPTForDepthEstimation.from_pretrained("facebook/dpt-dinov2-giant-kitti")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

# visualize the prediction
output = prediction.squeeze().cpu().numpy()
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)

📚 Documentation

Intended Use

The model is intended to showcase that using the DPT framework with DINOv2 as backbone yields a powerful depth estimator.

BibTeX entry and citation info

@misc{oquab2023dinov2,
      title={DINOv2: Learning Robust Visual Features without Supervision}, 
      author={Maxime Oquab and Timothée Darcet and Théo Moutakanni and Huy Vo and Marc Szafraniec and Vasil Khalidov and Pierre Fernandez and Daniel Haziza and Francisco Massa and Alaaeldin El-Nouby and Mahmoud Assran and Nicolas Ballas and Wojciech Galuba and Russell Howes and Po-Yao Huang and Shang-Wen Li and Ishan Misra and Michael Rabbat and Vasu Sharma and Gabriel Synnaeve and Hu Xu and Hervé Jegou and Julien Mairal and Patrick Labatut and Armand Joulin and Piotr Bojanowski},
      year={2023},
      eprint={2304.07193},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

📄 License

This model is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご