D

Dpt Hybrid Midas

Developed by Intel
A monocular depth estimation model based on Vision Transformer (ViT), trained on 1.4 million images
Downloads 224.05k
Release Time : 12/6/2022

Model Overview

Dense Prediction Transformer (DPT) model for monocular depth estimation tasks. This model uses ViT-hybrid as the backbone network and can predict depth information from a single image.

Model Features

Zero-shot transfer capability
The model has excellent zero-shot transfer capability and performs well on unseen datasets
Hybrid architecture
Uses ViT-hybrid as the backbone network, combining the advantages of convolution and transformers
Large-scale training
Trained on the MIX-6 dataset with approximately 1.4 million images, demonstrating strong generalization ability

Model Capabilities

Monocular depth estimation
Zero-shot transfer
Image depth prediction

Use Cases

Computer vision
Scene depth analysis
Estimates the relative depth of objects in a scene from a single image
Can generate a depth map corresponding to the input image
3D scene reconstruction
Provides depth information for 3D reconstruction
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase