DUSt3R Open-Source 3D Vision Model - Easily Reconstruct 3D Scenes from Single or Multiple Images

Dust3r ViTLarge BaseDecoder 224 Linear

Developed by naver

DUSt3R is a model for easily achieving geometric 3D vision from images, capable of reconstructing 3D scenes from single or multiple images.

3D Vision

Safetensors

#Image to 3D #Geometric Reconstruction #ViT Architecture

Downloads 1,829

Release Time : 6/19/2024

Model Overview

DUSt3R is a 3D vision model based on the ViT architecture, focusing on recovering 3D geometric information from 2D images. It employs an asymmetric CroCo3DStereo architecture, capable of processing single-view or multi-view inputs to output the geometric structure of 3D scenes.

Model Features

Single-view and Multi-view 3D Reconstruction

Capable of reconstructing 3D geometric structures from single or multiple images.

Efficient ViT Architecture

Uses Vision Transformer architecture, combining a large encoder with a small decoder.

High-resolution Processing

Supports input resolution of 224x224 pixels.

Model Capabilities

3D Scene Reconstruction

Geometric Shape Recovery

Depth Estimation

Point Cloud Generation

Use Cases

Computer Vision

3D Scene Reconstruction

Reconstruct 3D scenes from single or multiple 2D images.

Generates 3D geometric structures and depth information of the scene.

Augmented Reality

Provides 3D environmental understanding for AR applications.

Robotic Vision

Environmental Perception

Helps robots understand the 3D structure of their surroundings.

Property	Details
Model Type	DUSt3R_ViTLarge_BaseDecoder_224_linear
Training resolutions	224x224
Head	Linear
Encoder	ViT-L
Decoder	ViT-B

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Dust3r ViTLarge BaseDecoder 224 Linear

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 DUSt3R: Geometric 3D Vision Made Easy

🚀 Quick Start

💻 Usage Examples

Basic Usage

📚 Documentation

Model info

Citation

📄 License