DPT-DiNovo2-Base-NYU Open-Source Model - Achieve Precise Depth Estimation Tasks for Free

Home

Dpt Dinov2 Base Nyu

Developed by facebook

A DPT model using DINOv2 as the backbone network for depth estimation tasks.

3D Vision

Transformers

Open Source License:Apache-2.0 #Depth Estimation #DINOv2 Backbone #Dense Prediction

Downloads 146

Release Time : 10/31/2023

Model Overview

This model combines DINOv2's visual feature extraction capability with the DPT framework to achieve high-quality depth estimation.

Model Features

DINOv2 Backbone

Utilizes DINOv2 as the backbone network, providing powerful visual feature extraction capabilities.

DPT Framework

Employs the Dense Prediction Transformer (DPT) architecture for high-precision depth estimation.

Unsupervised Learning

Trained with unsupervised learning, enabling the acquisition of robust visual features.

Model Capabilities

Depth Estimation

Image Analysis

Use Cases

Computer Vision

Scene Depth Estimation

Estimates depth information for each pixel in the image.

Generates a depth map of the same size as the input image.

🚀 Model Card: DPT model with DINOv2 backbone

This model card introduces a DPT (Dense Prediction Transformer) model with a DINOv2 backbone. It demonstrates that using the DPT framework with DINOv2 as the backbone can create a powerful depth estimator.

✨ Features

Utilizes the DPT framework with a DINOv2 backbone for depth estimation.
Can be easily integrated with the Transformers library.

📦 Installation

Since the model uses the transformers library, you can install it using the following command:

pip install transformers torch pillow requests

💻 Usage Examples

Basic Usage

from transformers import AutoImageProcessor, DPTForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("facebook/dpt-dinov2-base-nyu")
model = DPTForDepthEstimation.from_pretrained("facebook/dpt-dinov2-base-nyu")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

# visualize the prediction
output = prediction.squeeze().cpu().numpy()
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)

📚 Documentation

Model Details

DPT (Dense Prediction Transformer) model with DINOv2 backbone as proposed in DINOv2: Learning Robust Visual Features without Supervision by Oquab et al.

drawing

DPT architecture. Taken from the original paper.

Resources

Model Use

Intended Use

The model is intended to showcase that using the DPT framework with DINOv2 as backbone yields a powerful depth estimator.

BibTeX entry and citation info

@misc{oquab2023dinov2,
      title={DINOv2: Learning Robust Visual Features without Supervision}, 
      author={Maxime Oquab and Timothée Darcet and Théo Moutakanni and Huy Vo and Marc Szafraniec and Vasil Khalidov and Pierre Fernandez and Daniel Haziza and Francisco Massa and Alaaeldin El-Nouby and Mahmoud Assran and Nicolas Ballas and Wojciech Galuba and Russell Howes and Po-Yao Huang and Shang-Wen Li and Ishan Misra and Michael Rabbat and Vasu Sharma and Gabriel Synnaeve and Hu Xu and Hervé Jegou and Julien Mairal and Patrick Labatut and Armand Joulin and Piotr Bojanowski},
      year={2023},
      eprint={2304.07193},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

📄 License

This model is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご