Depth Anything Open-Source Depth Estimation Model - Trained with Massive Images for Free and Precise Depth Estimation

Depth Anything Large Hf

Developed by LiheYoung

Depth Anything is a depth estimation model based on the DPT architecture and DINOv2 backbone network, trained on approximately 62 million images, achieving state-of-the-art results in both relative and absolute depth estimation tasks.

3D Vision

Transformers

Open Source License:Apache-2.0 #Zero-shot depth estimation #Large-scale unsupervised training #DPT architecture

Downloads 147.17k

Release Time : 1/23/2024

Model Overview

This model is designed for zero-shot depth estimation tasks, capable of predicting depth information from a single image.

Model Features

Large-scale training data

Trained on approximately 62 million images, enhancing the model's generalization capability.

Advanced architecture

Utilizes the DPT architecture and DINOv2 backbone network, combining the advantages of Transformers.

Zero-shot capability

Can be directly applied to depth estimation tasks without fine-tuning.

Model Capabilities

Single-image depth estimation

Zero-shot depth prediction

Use Cases

Computer vision

3D scene reconstruction

Predicts depth information from a single 2D image for 3D scene reconstruction.

Augmented reality

Provides depth information support for AR applications.

🚀 Depth Anything (large-sized model, Transformers version)

Depth Anything is a model designed for depth estimation, which effectively addresses the challenges in this field and offers high - performance solutions.

🚀 Quick Start

Depth Anything is a model introduced in the paper Depth Anything: Unleashing the Power of Large - Scale Unlabeled Data by Lihe Yang et al. and first released in this repository. An Online demo is also available.

Disclaimer: The team releasing Depth Anything did not write a model card for this model, so this model card has been written by the Hugging Face team.

✨ Features

Advanced Architecture: Depth Anything leverages the DPT architecture with a DINOv2 backbone.
Large - Scale Training: Trained on approximately 62 million images, it achieves state - of - the - art results for both relative and absolute depth estimation.

Depth Anything overview

Depth Anything overview. Taken from the original paper.

📚 Documentation

Intended uses & limitations

You can use the raw model for tasks like zero - shot depth estimation. Check the model hub to find other versions for tasks that interest you.

How to use

Basic Usage

Here is how to use this model to perform zero - shot depth estimation:

from transformers import pipeline
from PIL import Image
import requests

# load pipe
pipe = pipeline(task="depth-estimation", model="LiheYoung/depth-anything-large-hf")

# load image
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

# inference
depth = pipe(image)["depth"]

Advanced Usage

Alternatively, one can use the classes themselves:

from transformers import AutoImageProcessor, AutoModelForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("LiheYoung/depth-anything-large-hf")
model = AutoModelForDepthEstimation.from_pretrained("LiheYoung/depth-anything-large-hf")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

For more code examples, refer to the documentation.

BibTeX entry and citation info

@misc{yang2024depth,
      title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, 
      author={Lihe Yang and Bingyi Kang and Zilong Huang and Xiaogang Xu and Jiashi Feng and Hengshuang Zhao},
      year={2024},
      eprint={2401.10891},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

📄 License

This model is released under the Apache - 2.0 license.

Property	Details
Model Type	Depth Anything (large - sized model, Transformers version)
Training Data	Approximately 62 million images

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご