Depth-Anything-V2-Metric-Outdoor-Small-hf Open Source Model - Accurately Estimate Depth Metrics in Outdoor Scenarios

Depth Anything V2 Metric Outdoor Small Hf

Developed by depth-anything

A fine-tuned version of Depth Anything V2, specifically designed for metric depth estimation in outdoor scenes, trained on the synthetic dataset Virtual KITTI.

3D Vision

Transformers

#Outdoor Depth Estimation #Metric Depth Prediction #Synthetic Data Training

Downloads 459

Release Time : 7/26/2024

Model Overview

This model is used for metric depth estimation in outdoor scenes, capable of predicting absolute depth values of objects in images. Based on the DPT architecture with DINOv2 as the backbone, it is trained on both synthetically annotated images and real unannotated images.

Model Features

Optimized for Outdoor Scenes

Fine-tuned specifically for outdoor scenes, suitable for depth estimation in environments like roads and buildings.

Metric Depth Estimation

Capable of predicting absolute depth values, not just relative depth relationships.

Synthetic Data Training

Trained on the Virtual KITTI synthetic dataset, enhancing performance in diverse scenarios.

Lightweight

The small version has only 24.8M parameters, making it suitable for deployment in resource-limited environments.

Model Capabilities

Image Depth Estimation

Outdoor Scene Analysis

Absolute Depth Prediction

Use Cases

Autonomous Driving

Road Scene Depth Perception

Used in autonomous driving systems for distance estimation of roads, obstacles, etc.

Augmented Reality

Outdoor AR Scene Construction

Provides scene depth information for augmented reality applications.

3D Reconstruction

Outdoor Scene 3D Modeling

Generates depth maps from single images to assist in 3D reconstruction.

🚀 Depth Anything V2 (Fine-tuned for Metric Depth Estimation) - Transformers Version

This model is a fine-tuned variant of Depth Anything V2 designed for outdoor metric depth estimation. It utilizes synthetic Virtual KITTI datasets and is compatible with the transformers library.

Depth Anything V2, introduced in the paper of the same name by Lihe Yang et al., shares the architecture of the original Depth Anything. However, it uses synthetic data and a larger-capacity teacher model to achieve more precise and robust depth predictions. This fine-tuned version for metric depth estimation was first released in this repository.

✨ Features

Multiple Model Scales: Six metric depth models of three scales are available for both indoor and outdoor scenes.
State-of-the-Art Performance: Trained on a large amount of synthetic and real data, it achieves state-of-the-art results in relative and absolute depth estimation.

📦 Installation

Requirements

transformers>=4.45.0

Alternatively, you can install the latest version of transformers from the source:

pip install git+https://github.com/huggingface/transformers

💻 Usage Examples

Basic Usage

Here is how to use this model to perform zero-shot depth estimation:

from transformers import pipeline
from PIL import Image
import requests

# load pipe
pipe = pipeline(task="depth-estimation", model="depth-anything/Depth-Anything-V2-Metric-Outdoor-Small-hf")

# load image
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

# inference
depth = pipe(image)["depth"]

Advanced Usage

You can also use the model and processor classes:

from transformers import AutoImageProcessor, AutoModelForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("depth-anything/Depth-Anything-V2-Metric-Outdoor-Small-hf")
model = AutoModelForDepthEstimation.from_pretrained("depth-anything/Depth-Anything-V2-Metric-Outdoor-Small-hf")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

For more code examples, please refer to the documentation.

📚 Documentation

Model description

Depth Anything V2 is built on the DPT architecture with a DINOv2 backbone. It is trained on approximately 600K synthetic labeled images and 62 million real unlabeled images, delivering excellent performance in depth estimation tasks.

Depth Anything overview. Taken from the original paper.

Intended uses & limitations

You can use the raw model for zero-shot depth estimation. Check the model hub for other versions that suit your needs.

📄 License

The model is subject to the relevant license terms of the original paper and the repository.

📦 Available Models

Base Model	Params	Indoor (Hypersim)	Outdoor (Virtual KITTI 2)
Depth-Anything-V2-Small	24.8M	Model Card	Model Card
Depth-Anything-V2-Base	97.5M	Model Card	Model Card
Depth-Anything-V2-Large	335.3M	Model Card	Model Card

📖 Citation

@article{depth_anything_v2,
  title={Depth Anything V2},
  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
  journal={arXiv:2406.09414},
  year={2024}
}

@inproceedings{depth_anything_v1,
  title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, 
  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
  booktitle={CVPR},
  year={2024}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご