DPT-DiNoV2-Large-NYU Open-Source Model - Free Deployment to Aid Precise Depth Estimation Tasks

Home

Dpt Dinov2 Large Nyu

Developed by facebook

This model employs a DPT framework built on the DINOv2 backbone for depth estimation tasks.

3D Vision

Transformers

Open Source License:Apache-2.0 #Depth Estimation #Unsupervised Visual Features #Dense Prediction Transformer

Downloads 80

Release Time : 10/31/2023

Model Overview

The DPT (Dense Prediction Transformer) model combined with the DINOv2 backbone enables high-quality depth estimation.

Model Features

DINOv2 Backbone

Utilizes robust visual features obtained through unsupervised learning to enhance model performance.

Dense Prediction Transformer

Employs the DPT framework for dense prediction, suitable for depth estimation tasks.

High-Quality Depth Estimation

Capable of generating high-precision depth maps for various visual scenarios.

Model Capabilities

Image Depth Estimation

Visual Feature Extraction

Use Cases

Computer Vision

Scene Depth Estimation

Performs depth estimation on input images to generate corresponding depth maps.

Produces high-quality depth maps for applications such as 3D reconstruction and augmented reality.

🚀 Model Card: DPT model with DINOv2 backbone

This model card presents a DPT (Dense Prediction Transformer) model with a DINOv2 backbone. It's designed to demonstrate the effectiveness of using DINOv2 in the DPT framework for depth estimation.

🚀 Quick Start

Use with Transformers

The following is a Python code example demonstrating how to use the model for depth estimation:

from transformers import AutoImageProcessor, DPTForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("facebook/dpt-dinov2-large-nyu")
model = DPTForDepthEstimation.from_pretrained("facebook/dpt-dinov2-large-nyu")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

# visualize the prediction
output = prediction.squeeze().cpu().numpy()
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)

✨ Features

Model Details

The DPT model with a DINOv2 backbone is proposed in DINOv2: Learning Robust Visual Features without Supervision by Oquab et al.

DPT architecture

DPT architecture. Taken from the original paper.

Resources

📚 Documentation

Model Use

Intended Use

The model is intended to showcase that using the DPT framework with DINOv2 as backbone yields a powerful depth estimator.

BibTeX entry and citation info

@misc{oquab2023dinov2,
      title={DINOv2: Learning Robust Visual Features without Supervision}, 
      author={Maxime Oquab and Timothée Darcet and Théo Moutakanni and Huy Vo and Marc Szafraniec and Vasil Khalidov and Pierre Fernandez and Daniel Haziza and Francisco Massa and Alaaeldin El-Nouby and Mahmoud Assran and Nicolas Ballas and Wojciech Galuba and Russell Howes and Po-Yao Huang and Shang-Wen Li and Ishan Misra and Michael Rabbat and Vasu Sharma and Gabriel Synnaeve and Hu Xu and Hervé Jegou and Julien Mairal and Patrick Labatut and Armand Joulin and Piotr Bojanowski},
      year={2023},
      eprint={2304.07193},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

📄 License

This model is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご