The open-source model of "dpt-dinov2-small-nyu" - Can be used freely for precise depth estimation tasks

Home

Dpt Dinov2 Small Nyu

Developed by facebook

DPT model using DINOv2 as backbone for depth estimation tasks.

3D Vision

Transformers

Open Source License:Apache-2.0 #Depth Estimation #DINOv2 Backbone #Dense Prediction

Downloads 23

Release Time : 10/31/2023

Model Overview

This model combines DINOv2's visual feature extraction capability with DPT's dense prediction architecture, specifically designed for depth estimation from single images.

Model Features

DINOv2 Backbone

Utilizes robust visual features obtained through unsupervised learning as foundation

Dense Prediction Architecture

Capable of generating high-resolution depth map predictions

Efficient Inference

Small version suitable for practical application deployment

Model Capabilities

Single-image depth estimation

High-resolution depth map generation

Use Cases

Computer Vision

3D Scene Reconstruction

Estimating scene depth information from single RGB images

Can be used to construct 3D scene models

Augmented Reality

Providing scene depth information for AR applications

Enhances interaction between virtual objects and real scenes

🚀 Model Card: DPT model with DINOv2 backbone

This model showcases a powerful depth estimator by using the DPT framework with DINOv2 as the backbone.

🚀 Quick Start

Model Details

The DPT (Dense Prediction Transformer) model with a DINOv2 backbone, as proposed in DINOv2: Learning Robust Visual Features without Supervision by Oquab et al.

drawing

DPT architecture. Taken from the original paper.

Resources

Use with Transformers

from transformers import AutoImageProcessor, DPTForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("facebook/dpt-dinov2-small-nyu")
model = DPTForDepthEstimation.from_pretrained("facebook/dpt-dinov2-small-nyu")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

# visualize the prediction
output = prediction.squeeze().cpu().numpy()
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)

📚 Documentation

Model Use

Intended Use

The model is intended to showcase that using the DPT framework with DINOv2 as the backbone yields a powerful depth estimator.

BibTeX entry and citation info

@misc{oquab2023dinov2,
      title={DINOv2: Learning Robust Visual Features without Supervision}, 
      author={Maxime Oquab and Timothée Darcet and Théo Moutakanni and Huy Vo and Marc Szafraniec and Vasil Khalidov and Pierre Fernandez and Daniel Haziza and Francisco Massa and Alaaeldin El-Nouby and Mahmoud Assran and Nicolas Ballas and Wojciech Galuba and Russell Howes and Po-Yao Huang and Shang-Wen Li and Ishan Misra and Michael Rabbat and Vasu Sharma and Gabriel Synnaeve and Hu Xu and Hervé Jegou and Julien Mairal and Patrick Labatut and Armand Joulin and Piotr Bojanowski},
      year={2023},
      eprint={2304.07193},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

📄 License

The model is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご