Depth-Anything-V2-Metric-Indoor-Small-hf Open Source Model - Accurately Achieve Indoor Metric Depth Estimation

Depth Anything V2 Metric Indoor Small Hf

Developed by depth-anything

A model fine-tuned from Depth Anything V2 for indoor metric depth estimation tasks, trained on the synthetic dataset Hypersim, compatible with the transformers library.

3D Vision

Transformers

#Indoor Depth Estimation #Metric Depth #Synthetic Data Training

Downloads 750

Release Time : 7/26/2024

Model Overview

This model specializes in metric depth estimation for indoor scenes, capable of predicting absolute depth values for each pixel in an image, suitable for applications such as indoor 3D reconstruction and robot navigation.

Model Features

Indoor Metric Depth Estimation

Optimized for absolute depth prediction in indoor scenes

Synthetic Data Training

Trained on the Hypersim synthetic dataset to enhance model generalization

Transformers Compatibility

Directly usable via the Hugging Face Transformers library

Multiple Size Options

Available in Small, Base, and Large parameter sizes

Model Capabilities

Indoor scene depth estimation

Absolute depth prediction

Image depth map generation

Use Cases

3D Reconstruction

Indoor Scene 3D Modeling

Generates depth information from a single indoor image to assist in 3D scene reconstruction

Robot Navigation

Indoor Environment Perception

Provides environmental depth information for service robots to assist in obstacle avoidance and path planning

Augmented Reality

AR Scene Understanding

Supplies scene depth information for augmented reality applications to enable more realistic virtual object placement

🚀 Depth Anything V2 (Fine-tuned for Metric Depth Estimation) - Transformers Version

This model is a fine-tuned variant of Depth Anything V2 designed for indoor metric depth estimation, utilizing the synthetic Hypersim datasets. The model checkpoint is compatible with the transformers library.

Depth Anything V2 was introduced in the paper of the same name by Lihe Yang et al. It shares the same architecture as the original Depth Anything release but uses synthetic data and a larger-capacity teacher model to achieve more precise and robust depth predictions. This fine-tuned version for metric depth estimation was first released in this repository.

✨ Features

Six metric depth models: Three scales of models are available for indoor and outdoor scenes respectively.

📦 Installation

The model requires transformers>=4.45.0. You can either install the specific version or the latest version from the source:

pip install git+https://github.com/huggingface/transformers

💻 Usage Examples

Basic Usage

Here is how to use this model to perform zero-shot depth estimation:

from transformers import pipeline
from PIL import Image
import requests

# load pipe
pipe = pipeline(task="depth-estimation", model="depth-anything/Depth-Anything-V2-Metric-Indoor-Small-hf")

# load image
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

# inference
depth = pipe(image)["depth"]

Advanced Usage

Alternatively, you can use the model and processor classes:

from transformers import AutoImageProcessor, AutoModelForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("depth-anything/Depth-Anything-V2-Metric-Indoor-Small-hf")
model = AutoModelForDepthEstimation.from_pretrained("depth-anything/Depth-Anything-V2-Metric-Indoor-Small-hf")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

For more code examples, please refer to the documentation.

📚 Documentation

Model description

Depth Anything V2 leverages the DPT architecture with a DINOv2 backbone. The model is trained on ~600K synthetic labeled images and ~62 million real unlabeled images, achieving state-of-the-art results for both relative and absolute depth estimation.

Depth Anything overview

Depth Anything overview. Taken from the original paper.

Available Models

Six metric depth models of three scales for indoor and outdoor scenes are available:

Base Model	Params	Indoor (Hypersim)	Outdoor (Virtual KITTI 2)
Depth-Anything-V2-Small	24.8M	Model Card	Model Card
Depth-Anything-V2-Base	97.5M	Model Card	Model Card
Depth-Anything-V2-Large	335.3M	Model Card	Model Card

📄 License

If you use this model in your research, please cite the following papers:

@article{depth_anything_v2,
  title={Depth Anything V2},
  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
  journal={arXiv:2406.09414},
  year={2024}
}

@inproceedings{depth_anything_v1,
  title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, 
  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
  booktitle={CVPR},
  year={2024}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご