DepthPro-mixin Open-Source Depth Estimation Model - Zero-Shot Monocular Ranging to Synthesize High-Resolution Depth Maps

Depthpro Mixin

Developed by apple

A zero-shot monocular ranging depth estimation foundation model capable of synthesizing high-resolution depth maps with unparalleled sharpness and high-frequency details

3D Vision

Safetensors

#Sub-second ranging #Zero-shot depth estimation #High-frequency detail preservation

Downloads 17

Release Time : 10/5/2024

Model Overview

Depth Pro is a high-performance monocular depth estimation model that quickly generates metric depth maps with absolute scale, without relying on metadata such as camera intrinsics.

Model Features

Sub-second inference speed

Generates a 2.25-million-pixel depth map in just 0.3 seconds on a standard GPU

High-precision boundary tracking

Training scheme combining real and synthetic data maintains fine boundary tracking capability

No camera parameters required

Predictions are metric values with absolute scale, independent of metadata like camera intrinsics

Focal length estimation capability

Incorporates cutting-edge technology for focal length estimation from a single image

Model Capabilities

Monocular depth estimation

Metric depth prediction

High-frequency detail preservation

Fast inference

Use Cases

Computer vision

3D scene reconstruction

Reconstruct 3D scenes from a single image

Generates precise depth maps with absolute scale

Augmented reality

Provides real-time depth information for AR applications

Enables accurate interaction between virtual objects and real scenes

Robotics

Autonomous navigation

Provides environmental depth perception for robots

Supports obstacle avoidance and path planning

🚀 Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

A foundation model for zero - shot metric monocular depth estimation, synthesizing high - resolution depth maps with sharpness and high - frequency details.

Depth Pro Demo Image

We present a foundation model for zero - shot metric monocular depth estimation. Our model, Depth Pro, synthesizes high - resolution depth maps with unparalleled sharpness and high - frequency details. The predictions are metric, with absolute scale, without relying on the availability of metadata such as camera intrinsics. And the model is fast, producing a 2.25 - megapixel depth map in 0.3 seconds on a standard GPU. These characteristics are enabled by a number of technical contributions, including an efficient multi - scale vision transformer for dense prediction, a training protocol that combines real and synthetic datasets to achieve high metric accuracy alongside fine boundary tracing, dedicated evaluation metrics for boundary accuracy in estimated depth maps, and state - of - the - art focal length estimation from a single image.

Depth Pro was introduced in Depth Pro: Sharp Monocular Metric Depth in Less Than a Second, by Aleksei Bochkovskii, Amaël Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R. Richter, and Vladlen Koltun.

The checkpoint in this repository is a reference implementation, which has been re - trained. Its performance is close to the model reported in the paper but does not match it exactly.

🚀 Quick Start

Please, follow the steps in the code repository to set up your environment. Then you can start using the model.

💻 Usage Examples

Basic Usage

from huggingface_hub import PyTorchModelHubMixin
from depth_pro import create_model_and_transforms, load_rgb
from depth_pro.depth_pro import (create_backbone_model, load_monodepth_weights,
                                 DepthPro, DepthProEncoder, MultiresConvDecoder)
import depth_pro
from torchvision.transforms import Compose, Normalize, ToTensor


class DepthProWrapper(DepthPro, PyTorchModelHubMixin):
    """Depth Pro network."""

    def __init__(
        self,
        patch_encoder_preset: str,
        image_encoder_preset: str,
        decoder_features: str,
        fov_encoder_preset: str,
        use_fov_head: bool = True,
        **kwargs,
    ):
        """Initialize Depth Pro."""

        patch_encoder, patch_encoder_config = create_backbone_model(
            preset=patch_encoder_preset
        )
        image_encoder, _ = create_backbone_model(
            preset=image_encoder_preset
        )

        fov_encoder = None
        if use_fov_head and fov_encoder_preset is not None:
            fov_encoder, _ = create_backbone_model(preset=fov_encoder_preset)

        dims_encoder = patch_encoder_config.encoder_feature_dims
        hook_block_ids = patch_encoder_config.encoder_feature_layer_ids
        encoder = DepthProEncoder(
            dims_encoder=dims_encoder,
            patch_encoder=patch_encoder,
            image_encoder=image_encoder,
            hook_block_ids=hook_block_ids,
            decoder_features=decoder_features,
        )
        decoder = MultiresConvDecoder(
            dims_encoder=[encoder.dims_encoder[0]] + list(encoder.dims_encoder),
            dim_decoder=decoder_features,
        )

        super().__init__(
            encoder=encoder,
            decoder=decoder,
            last_dims=(32, 1),
            use_fov_head=use_fov_head,
            fov_encoder=fov_encoder,
        )


# Load model and preprocessing transform
model = DepthProWrapper.from_pretrained("apple/DepthPro-mixin")
transform = Compose(
        [
            ToTensor(),
            Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]),
        ]
    )


model.eval()

# Load and preprocess an image.
image, _, f_px = depth_pro.load_rgb(image_path)
image = transform(image)

# Run inference.
prediction = model.infer(image, f_px=f_px)
depth = prediction["depth"]  # Depth in [m].
focallength_px = prediction["focallength_px"]  # Focal length in pixels.

Advanced Usage

# for a depth-based dataset
boundary_f1 = SI_boundary_F1(predicted_depth, target_depth)

# for a mask-based dataset (image matting / segmentation) 
boundary_recall = SI_boundary_Recall(predicted_depth, target_mask)

📄 License

The model uses the apple - amlr license.

Property	Details
Model Type	Foundation model for zero - shot metric monocular depth estimation
Pipeline Tag	depth - estimation
Tags	model_hub_mixin, pytorch_model_hub_mixin

📚 Documentation

Citation

If you find our work useful, please cite the following paper:

@article{Bochkovskii2024:arxiv,
  author     = {Aleksei Bochkovskii and Ama\"{e}l Delaunoy and Hugo Germain and Marcel Santos and
               Yichao Zhou and Stephan R. Richter and Vladlen Koltun}
  title      = {Depth Pro: Sharp Monocular Metric Depth in Less Than a Second},
  journal    = {arXiv},
  year       = {2024},
}

Acknowledgements

Our codebase is built using multiple opensource contributions, please see Acknowledgements for more details.

Please check the paper for a complete list of references and datasets used in this work.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご