D

Dpt Large

Developed by Intel
A monocular depth estimation model based on Vision Transformer (ViT), trained on 1.4 million images, suitable for zero-shot depth prediction tasks.
Downloads 364.62k
Release Time : 3/2/2022

Model Overview

Dense Prediction Transformer (DPT) model, specifically designed for estimating depth information from a single image, capable of cross-dataset transfer without fine-tuning for specific scenes.

Model Features

Zero-shot transfer capability
Achieves good performance on new datasets without fine-tuning, with a DIW WHDR metric of 10.82
Multi-dataset training
Trained on the MIX-6 dataset (approximately 1.4 million images), covering diverse scenarios
Vision Transformer architecture
Utilizes a ViT backbone combined with a specialized prediction head for dense prediction tasks

Model Capabilities

Single-image depth estimation
Cross-dataset zero-shot transfer
Dense prediction transformation

Use Cases

Computer Vision
Scene understanding
Infers scene depth information from a single RGB image
Can generate depth maps with the same resolution as the input image
Augmented Reality
Provides real-time depth perception for AR applications
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase