Depth-Anything-V2-Metric-Outdoor-Base-hf Open-source Model - Accurately Realize Outdoor Metric Depth Estimation

Depth Anything V2 Metric Outdoor Base Hf

Developed by depth-anything

A version fine-tuned for outdoor metric depth estimation tasks using the synthetic Virtual KITTI dataset, compatible with the transformers library.

3D Vision

Transformers

#Outdoor Depth Estimation #Metric Depth Prediction #Zero-shot Learning

Downloads 436

Release Time : 7/26/2024

Model Overview

This model focuses on metric depth estimation tasks for outdoor scenes, capable of predicting absolute depth values of objects in images.

Model Features

Outdoor Scene Optimization

Specially fine-tuned for metric depth estimation in outdoor scenes.

Large-scale Pretraining

Trained on approximately 600,000 synthetic annotated images and 62 million real unlabeled images.

Multi-scale Support

Offers three model size options: Small/Base/Large.

Model Capabilities

Outdoor Scene Depth Estimation

Absolute Depth Prediction

Zero-shot Depth Estimation

Use Cases

Autonomous Driving

Road Scene Depth Perception

Used for depth perception of the surrounding environment by autonomous vehicles.

Robot Navigation

Outdoor Environment Mapping

Assists robots in constructing 3D maps in outdoor environments.

🚀 Depth Anything V2 (Fine-tuned for Metric Depth Estimation) - Transformers Version

This model is a fine-tuned variant of Depth Anything V2 designed for outdoor metric depth estimation using synthetic Virtual KITTI datasets. It's compatible with the transformers library.

Depth Anything V2, introduced in the paper of the same name by Lihe Yang et al., shares the architecture of the original Depth Anything but uses synthetic data and a larger-capacity teacher model for finer and more robust depth predictions. This metric depth estimation fine-tuned version was first released in this repository.

✨ Features

Multiple Model Scales: Six metric depth models of three scales are available for indoor and outdoor scenes respectively.
State-of-the-Art Results: Trained on ~600K synthetic labeled images and ~62 million real unlabeled images, achieving top - notch results in relative and absolute depth estimation.

📦 Installation

Requirements

transformers>=4.45.0

Alternatively, you can install the latest transformers version from the source:

pip install git+https://github.com/huggingface/transformers

💻 Usage Examples

Basic Usage

Here's how to use this model for zero - shot depth estimation:

from transformers import pipeline
from PIL import Image
import requests

# load pipe
pipe = pipeline(task="depth-estimation", model="depth-anything/Depth-Anything-V2-Metric-Outdoor-Base-hf")

# load image
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

# inference
depth = pipe(image)["depth"]

Advanced Usage

You can also use the model and processor classes:

from transformers import AutoImageProcessor, AutoModelForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("depth-anything/Depth-Anything-V2-Metric-Outdoor-Base-hf")
model = AutoModelForDepthEstimation.from_pretrained("depth-anything/Depth-Anything-V2-Metric-Outdoor-Base-hf")

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# interpolate to original size
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

For more code examples, refer to the documentation.

📚 Documentation

Model Details

Base Model	Params	Indoor (Hypersim)	Outdoor (Virtual KITTI 2)
Depth-Anything-V2-Small	24.8M	Model Card	Model Card
Depth-Anything-V2-Base	97.5M	Model Card	Model Card
Depth-Anything-V2-Large	335.3M	Model Card	Model Card

Depth Anything V2 uses the DPT architecture with a DINOv2 backbone.

The model is trained on ~600K synthetic labeled images and ~62 million real unlabeled images.

Depth Anything overview. Taken from the original paper.

Intended Uses & Limitations

You can use the raw model for tasks like zero - shot depth estimation. Check the model hub for other versions related to your interested tasks.

📄 License

Citation

@article{depth_anything_v2,
  title={Depth Anything V2},
  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
  journal={arXiv:2406.09414},
  year={2024}
}

@inproceedings{depth_anything_v1,
  title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, 
  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
  booktitle={CVPR},
  year={2024}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご