Depth Anything Small HF Open-Source Depth Estimation Model - Free Deployment for Accurately Completing Depth Estimation Tasks

Depth Anything Small Hf

Developed by LiheYoung

Depth Anything is a depth estimation model based on the DPT architecture, utilizing the DINOv2 backbone network. It was trained on approximately 62 million images and excels in both relative and absolute depth estimation tasks.

3D Vision

Transformers

Open Source License:Apache-2.0 #Zero-shot depth estimation #Large-scale unsupervised training #DPT architecture

Downloads 97.89k

Release Time : 1/22/2024

Model Overview

This model is designed for zero-shot depth estimation tasks, capable of predicting depth information from a single image.

Model Features

Large-scale training data

Trained on approximately 62 million images, enhancing the model's generalization capability.

Zero-shot depth estimation

Can be directly applied to depth estimation tasks without fine-tuning.

Advanced architecture

Utilizes the DPT architecture and DINOv2 backbone network, combining the strengths of both technologies.

Model Capabilities

Single-image depth estimation

Zero-shot learning

Use Cases

Computer vision

3D scene reconstruction

Estimates depth information from a single image for 3D scene reconstruction.

Generates accurate depth maps

Augmented reality

Provides scene depth information for AR applications.

Improves interaction between virtual objects and real scenes

🚀 Depth Anything (small-sized model, Transformers version)

Depth Anything is a model designed for depth estimation, which effectively utilizes large - scale unlabeled data to achieve state - of - the - art results.

🚀 Quick Start

Depth Anything model was introduced in the paper Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data by Lihe Yang et al. and first released in this repository. An Online demo is also provided.

Disclaimer: The team releasing Depth Anything did not write a model card for this model so this model card has been written by the Hugging Face team.

✨ Features

Leverages the DPT architecture with a DINOv2 backbone.
Trained on ~62 million images, obtaining state - of - the - art results for both relative and absolute depth estimation.

Property	Details
Model Type	Depth estimation model based on DPT architecture with DINOv2 backbone
Training Data	Approximately 62 million images

drawing

Depth Anything overview. Taken from the original paper.

📚 Documentation

Intended uses & limitations

You can use the raw model for tasks like zero - shot depth estimation. See the model hub to look for other versions on a task that interests you.

💻 Usage Examples

Basic Usage

from transformers import pipeline
from PIL import Image
import requests

# load pipe
pipe = pipeline(task="depth-estimation", model="LiheYoung/depth-anything-small-hf")

# load image
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

# inference
depth = pipe(image)["depth"]

Advanced Usage

from transformers import AutoImageProcessor, AutoModelForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("LiheYoung/depth-anything-small-hf")
model = AutoModelForDepthEstimation.from_pretrained("LiheYoung/depth-anything-small-hf")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

For more code examples, we refer to the documentation.

BibTeX entry and citation info

@misc{yang2024depth,
      title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, 
      author={Lihe Yang and Bingyi Kang and Zilong Huang and Xiaogang Xu and Jiashi Feng and Hengshuang Zhao},
      year={2024},
      eprint={2401.10891},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご