Depth Anything, a powerful tool for depth estimation, is open-sourced! Achieve zero-shot depth estimation through training on a massive number of images.

Depth Anything Base Hf

Developed by LiheYoung

Depth Anything is a depth estimation model based on the DPT architecture and DINOv2 backbone network, trained on approximately 62 million images, achieving state-of-the-art performance in zero-shot depth estimation.

3D Vision

Transformers

Open Source License:Apache-2.0 #Zero-shot depth estimation #Large-scale unsupervised training #DPT architecture

Downloads 4,101

Release Time : 1/22/2024

Model Overview

This model is primarily used for depth estimation tasks, capable of predicting depth information from a single image, suitable for various applications in the field of computer vision.

Model Features

Large-scale training data

The model is trained on approximately 62 million images, providing strong generalization capabilities.

Zero-shot depth estimation

Can be directly applied to depth estimation tasks in various scenarios without domain-specific fine-tuning.

Advanced architecture

Combines the DPT architecture and DINOv2 backbone network to achieve high-performance depth prediction.

Model Capabilities

Single-image depth estimation

Zero-shot prediction

Computer vision analysis

Use Cases

Computer vision

3D scene reconstruction

Predict depth information from a single 2D image to assist in 3D scene reconstruction

Augmented reality

Provide scene depth information for AR applications

🚀 Depth Anything (base-sized model, Transformers version)

Depth Anything is a model for depth estimation, leveraging large - scale unlabeled data to achieve state - of - the - art results.

🚀 Quick Start

You can use the Depth Anything model for zero - shot depth estimation tasks. For other versions of the model, you can search on the model hub.

✨ Features

Leverages the DPT architecture with a DINOv2 backbone.
Trained on approximately 62 million images, achieving state - of - the - art results for both relative and absolute depth estimation.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

from transformers import pipeline
from PIL import Image
import requests

# load pipe
pipe = pipeline(task="depth-estimation", model="LiheYoung/depth-anything-base-hf")

# load image
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

# inference
depth = pipe(image)["depth"]

Advanced Usage

from transformers import AutoImageProcessor, AutoModelForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("LiheYoung/depth-anything-base-hf")
model = AutoModelForDepthEstimation.from_pretrained("LiheYoung/depth-anything-base-hf")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

For more code examples, refer to the documentation.

📚 Documentation

Model description

Depth Anything leverages the DPT architecture with a DINOv2 backbone.

The model is trained on ~62 million images, obtaining state - of - the - art results for both relative and absolute depth estimation.

drawing

Depth Anything overview. Taken from the original paper.

Intended uses & limitations

You can use the raw model for tasks like zero - shot depth estimation. See the model hub to look for other versions on a task that interests you.

BibTeX entry and citation info

@misc{yang2024depth,
      title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, 
      author={Lihe Yang and Bingyi Kang and Zilong Huang and Xiaogang Xu and Jiashi Feng and Hengshuang Zhao},
      year={2024},
      eprint={2401.10891},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

📄 License

The model is licensed under the Apache - 2.0 license.

Property	Details
Model Type	Depth Anything (base - sized model, Transformers version)
Training Data	Approximately 62 million images

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご