3D Vision

The Best 177 3D Vision Tools in 2025

An advanced 3D synthesis system developed by Tencent, capable of generating high-resolution textured 3D assets from images or text

3D Vision Supports Multiple Languages

TRELLIS Image Large

TRELLIS Image Large is an image-conditioned version of the large-scale 3D generation model TRELLIS, capable of generating 3D content based on input images.

3D Vision English

Depth Anything V2 Small Hf

Depth Anything V2 is currently the most powerful monocular depth estimation model, trained on 595,000 synthetically annotated images and over 62 million real unlabeled images, featuring fine details and robustness.

A monocular depth estimation model based on Vision Transformer (ViT), trained on 1.4 million images, suitable for zero-shot depth prediction tasks.

Dpt Hybrid Midas

A monocular depth estimation model based on Vision Transformer (ViT), trained on 1.4 million images

VGGT is a feedforward neural network capable of inferring all key 3D attributes from one, several, or hundreds of views of a scene within seconds.

3D Vision English

Depth Anything Large Hf

Depth Anything is a depth estimation model based on the DPT architecture and DINOv2 backbone network, trained on approximately 62 million images, achieving state-of-the-art results in both relative and absolute depth estimation tasks.

Depth Anything V2 Large

Depth Anything V2 is currently the most powerful monocular depth estimation model, trained on a large amount of synthetic and real images, providing fine depth details and high robustness.

3D Vision English

Mast3r ViTLarge BaseDecoder 512 Catmlpdpt Metric

MASt3R is a ViT-based image-to-3D model capable of matching images with 3D space

Depth Anything Small Hf

Depth Anything is a depth estimation model based on the DPT architecture, utilizing the DINOv2 backbone network. It was trained on approximately 62 million images and excels in both relative and absolute depth estimation tasks.

Marigold Depth V1 0

A monocular image depth estimation model fine-tuned based on Stable Diffusion, featuring affine invariance for depth prediction in natural scenes

3D Vision English

Depth Anything V2 Large Hf

Depth Anything V2 is currently the most powerful Monocular Depth Estimation (MDE) model, trained on 595,000 synthetically annotated images and over 62 million real unlabeled images, offering finer details and stronger robustness.

Depth Anything V2 Base

Depth Anything V2 is currently the most powerful monocular depth estimation (MDE) model, trained on 595,000 synthetically annotated images and over 62 million real unannotated images.

3D Vision English

Depth Anything V2 Small

Depth Anything V2 is currently the most powerful monocular depth estimation model, trained on large-scale synthetic and real images. Compared to V1, it captures finer details and is more robust.

3D Vision English

DepthCrafter is a model capable of generating temporally coherent long depth sequences for open-world videos with fine details, without requiring additional information such as camera poses or optical flow.

Depth Anything V2 Metric Indoor Large Hf

A fine-tuned version of Depth Anything V2 for indoor metric depth estimation using the synthetic Hypersim dataset, compatible with the transformers library.

Depth Anything V2 Base Hf

Depth Anything V2 is currently the most powerful monocular depth estimation model, trained on 595,000 synthetically annotated images and over 62 million real unlabeled images, offering finer details and stronger robustness.

Dust3r ViTLarge BaseDecoder 512 Dpt

DUSt3R is a model for easily achieving geometric 3D vision from images, capable of reconstructing 3D scenes from single or multiple images.

Lotus Depth G V1 0

Lotus is a vision foundation model based on diffusion models, focusing on high-quality dense prediction tasks.

Dpt Beit Base 384

DPT is a dense prediction transformer model based on the BEiT backbone network, designed for monocular depth estimation and trained on 1.4 million images.

Hunyuan3d 2mini

Tencent Hunyuan3D 2mini is a lightweight and efficient image-to-3D model with 600 million parameters, supporting both Chinese and English inputs.

3D Vision Supports Multiple Languages

Marigold Depth Lcm V1 0

A monocular depth estimation model fine-tuned using latent consistency distillation for generating depth maps from single images

3D Vision English

Zoedepth Nyu Kitti

ZoeDepth is a depth estimation model fine-tuned on NYU and KITTI datasets, capable of estimating depth values in actual metric units.

Yoso Normal V0 3

A model for generating stable and sharp normal maps by reducing diffusion variance

Trellis Normal V0 1

An improved version of TRELLIS that supports converting 2D images into 3D models, with special support for normal conditioning.

3D Vision English

TripoSR is a fast feed-forward 3D generation model jointly developed by Stability AI and Tripo AI, specializing in rapid 3D reconstruction from a single image.

Depth Anything Vitl14

Depth Anything is a powerful depth estimation model that unleashes the potential of depth estimation using large-scale unlabeled data.

Fast3r ViT Large 512

fast3r is a model focused on image-to-3D conversion, developed by Facebook Research.

DepthPro is a foundational model for zero-shot metric monocular depth estimation, capable of generating high-resolution, high-precision depth maps.

Openlrm Mix Base 1.1

OpenLRM is an open-source implementation of the LRM paper, capable of generating 3D models from a single image, with multiple versions of different scales.

Hunyuan3D-2 Multi-view Edition is a fine-tuned version of Hunyuan3D-2, supporting high-resolution textured 3D asset generation with multi-view shape control.

3D Vision Supports Multiple Languages

Depth Anything V2 Metric Indoor Base Hf

A version fine-tuned for indoor metric depth estimation tasks using the Hypersim synthetic dataset, based on the Depth Anything V2 model

Marigold Normals V0 1

A monocular image normal estimation model fine-tuned based on Stable Diffusion, capable of predicting surface normal maps from a single RGB image

3D Vision English

Depth Anything Vits14

Depth Anything is a depth estimation model that leverages large-scale unlabeled data to enhance performance, suitable for monocular depth estimation tasks.

The GLPN model is trained on the NYUv2 dataset for monocular depth estimation, combining global and local path networks to achieve high-precision depth prediction.

Monst3r PO TA S W ViTLarge BaseDecoder 512 Dpt

MonST3R is a simple method for estimating geometry in the presence of motion, capable of reconstructing 3D scenes from images.

Depth Anything Vitb14

Depth Anything is a depth estimation model trained on large-scale unlabeled data, capable of predicting depth information from a single image.

Yoso Normal V1 8 1

A model for generating stable and sharp normal maps by reducing diffusion variance

ZoeDepth is a vision model for monocular depth estimation, fine-tuned on the KITTI dataset, capable of achieving zero-shot transfer for metric depth estimation.

Shap-E is a diffusion-based text-to-3D generation model capable of producing 3D assets renderable as textured meshes and neural radiance fields from text prompts.

Theia Base Patch16 224 Cddsv

Theia is a vision foundation model for robot learning, enriched with visual representation capabilities through the distillation of multiple vision foundation models

TripoSG is a high-fidelity 3D shape synthesis foundational model based on large-scale rectified flow, capable of generating high-quality 3D meshes from a single image.

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase