The Best 177 3D Vision Tools in 2025

Hunyuan3d 2
Other
An advanced 3D synthesis system developed by Tencent, capable of generating high-resolution textured 3D assets from images or text
3D Vision Supports Multiple Languages
H
tencent
490.00k
1,314
TRELLIS Image Large
MIT
TRELLIS Image Large is an image-conditioned version of the large-scale 3D generation model TRELLIS, capable of generating 3D content based on input images.
3D Vision English
T
microsoft
463.44k
520
Depth Anything V2 Small Hf
Apache-2.0
Depth Anything V2 is currently the most powerful monocular depth estimation model, trained on 595,000 synthetically annotated images and over 62 million real unlabeled images, featuring fine details and robustness.
3D Vision Transformers
D
depth-anything
438.72k
15
Dpt Large
Apache-2.0
A monocular depth estimation model based on Vision Transformer (ViT), trained on 1.4 million images, suitable for zero-shot depth prediction tasks.
3D Vision Transformers
D
Intel
364.62k
187
Dpt Hybrid Midas
Apache-2.0
A monocular depth estimation model based on Vision Transformer (ViT), trained on 1.4 million images
3D Vision Transformers
D
Intel
224.05k
94
VGGT 1B
VGGT is a feedforward neural network capable of inferring all key 3D attributes from one, several, or hundreds of views of a scene within seconds.
3D Vision English
V
facebook
196.31k
40
Depth Anything Large Hf
Apache-2.0
Depth Anything is a depth estimation model based on the DPT architecture and DINOv2 backbone network, trained on approximately 62 million images, achieving state-of-the-art results in both relative and absolute depth estimation tasks.
3D Vision Transformers
D
LiheYoung
147.17k
51
Depth Anything V2 Large
Depth Anything V2 is currently the most powerful monocular depth estimation model, trained on a large amount of synthetic and real images, providing fine depth details and high robustness.
3D Vision English
D
depth-anything
130.54k
94
Mast3r ViTLarge BaseDecoder 512 Catmlpdpt Metric
MASt3R is a ViT-based image-to-3D model capable of matching images with 3D space
3D Vision
M
naver
116.60k
15
Depth Anything Small Hf
Apache-2.0
Depth Anything is a depth estimation model based on the DPT architecture, utilizing the DINOv2 backbone network. It was trained on approximately 62 million images and excels in both relative and absolute depth estimation tasks.
3D Vision Transformers
D
LiheYoung
97.89k
29
Marigold Depth V1 0
Apache-2.0
A monocular image depth estimation model fine-tuned based on Stable Diffusion, featuring affine invariance for depth prediction in natural scenes
3D Vision English
M
prs-eth
92.50k
127
Depth Anything V2 Large Hf
Depth Anything V2 is currently the most powerful Monocular Depth Estimation (MDE) model, trained on 595,000 synthetically annotated images and over 62 million real unlabeled images, offering finer details and stronger robustness.
3D Vision Transformers
D
depth-anything
83.99k
19
Depth Anything V2 Base
Depth Anything V2 is currently the most powerful monocular depth estimation (MDE) model, trained on 595,000 synthetically annotated images and over 62 million real unannotated images.
3D Vision English
D
depth-anything
66.95k
17
Depth Anything V2 Small
Apache-2.0
Depth Anything V2 is currently the most powerful monocular depth estimation model, trained on large-scale synthetic and real images. Compared to V1, it captures finer details and is more robust.
3D Vision English
D
depth-anything
55.22k
64
Depthcrafter
Other
DepthCrafter is a model capable of generating temporally coherent long depth sequences for open-world videos with fine details, without requiring additional information such as camera poses or optical flow.
3D Vision
D
tencent
55.08k
91
Depth Anything V2 Metric Indoor Large Hf
A fine-tuned version of Depth Anything V2 for indoor metric depth estimation using the synthetic Hypersim dataset, compatible with the transformers library.
3D Vision Transformers
D
depth-anything
47.99k
9
Depth Anything V2 Base Hf
Depth Anything V2 is currently the most powerful monocular depth estimation model, trained on 595,000 synthetically annotated images and over 62 million real unlabeled images, offering finer details and stronger robustness.
3D Vision Transformers
D
depth-anything
47.73k
1
Dust3r ViTLarge BaseDecoder 512 Dpt
DUSt3R is a model for easily achieving geometric 3D vision from images, capable of reconstructing 3D scenes from single or multiple images.
3D Vision
D
naver
46.93k
14
Lotus Depth G V1 0
Apache-2.0
Lotus is a vision foundation model based on diffusion models, focusing on high-quality dense prediction tasks.
3D Vision
L
jingheya
33.45k
21
Dpt Beit Base 384
MIT
DPT is a dense prediction transformer model based on the BEiT backbone network, designed for monocular depth estimation and trained on 1.4 million images.
3D Vision Transformers
D
Intel
25.98k
1
Hunyuan3d 2mini
Other
Tencent Hunyuan3D 2mini is a lightweight and efficient image-to-3D model with 600 million parameters, supporting both Chinese and English inputs.
3D Vision Supports Multiple Languages
H
tencent
23.22k
69
Marigold Depth Lcm V1 0
Apache-2.0
A monocular depth estimation model fine-tuned using latent consistency distillation for generating depth maps from single images
3D Vision English
M
prs-eth
22.45k
55
Zoedepth Nyu Kitti
MIT
ZoeDepth is a depth estimation model fine-tuned on NYU and KITTI datasets, capable of estimating depth values in actual metric units.
3D Vision Transformers
Z
Intel
20.32k
5
Yoso Normal V0 3
Apache-2.0
A model for generating stable and sharp normal maps by reducing diffusion variance
3D Vision
Y
Stable-X
20.30k
1
Trellis Normal V0 1
MIT
An improved version of TRELLIS that supports converting 2D images into 3D models, with special support for normal conditioning.
3D Vision English
T
Stable-X
19.31k
10
Triposr
MIT
TripoSR is a fast feed-forward 3D generation model jointly developed by Stability AI and Tripo AI, specializing in rapid 3D reconstruction from a single image.
3D Vision
T
stabilityai
19.25k
545
Depth Anything Vitl14
Depth Anything is a powerful depth estimation model that unleashes the potential of depth estimation using large-scale unlabeled data.
3D Vision Transformers
D
LiheYoung
16.70k
42
Fast3r ViT Large 512
Other
fast3r is a model focused on image-to-3D conversion, developed by Facebook Research.
3D Vision
F
jedyang97
16.34k
20
Depthpro Hf
DepthPro is a foundational model for zero-shot metric monocular depth estimation, capable of generating high-resolution, high-precision depth maps.
3D Vision Transformers
D
apple
13.96k
52
Openlrm Mix Base 1.1
OpenLRM is an open-source implementation of the LRM paper, capable of generating 3D models from a single image, with multiple versions of different scales.
3D Vision Transformers
O
zxhezexin
10.25k
6
Hunyuan3d 2mv
Other
Hunyuan3D-2 Multi-view Edition is a fine-tuned version of Hunyuan3D-2, supporting high-resolution textured 3D asset generation with multi-view shape control.
3D Vision Supports Multiple Languages
H
tencent
9,170
371
Depth Anything V2 Metric Indoor Base Hf
A version fine-tuned for indoor metric depth estimation tasks using the Hypersim synthetic dataset, based on the Depth Anything V2 model
3D Vision Transformers
D
depth-anything
9,056
1
Marigold Normals V0 1
Apache-2.0
A monocular image normal estimation model fine-tuned based on Stable Diffusion, capable of predicting surface normal maps from a single RGB image
3D Vision English
M
prs-eth
8,845
4
Depth Anything Vits14
Depth Anything is a depth estimation model that leverages large-scale unlabeled data to enhance performance, suitable for monocular depth estimation tasks.
3D Vision Transformers
D
LiheYoung
8,130
6
Glpn Nyu
Apache-2.0
The GLPN model is trained on the NYUv2 dataset for monocular depth estimation, combining global and local path networks to achieve high-precision depth prediction.
3D Vision Transformers
G
vinvino02
7,699
22
Monst3r PO TA S W ViTLarge BaseDecoder 512 Dpt
MonST3R is a simple method for estimating geometry in the presence of motion, capable of reconstructing 3D scenes from images.
3D Vision
M
Junyi42
7,641
17
Depth Anything Vitb14
Depth Anything is a depth estimation model trained on large-scale unlabeled data, capable of predicting depth information from a single image.
3D Vision Transformers
D
LiheYoung
7,152
3
Yoso Normal V1 8 1
Apache-2.0
A model for generating stable and sharp normal maps by reducing diffusion variance
3D Vision
Y
Stable-X
7,080
3
Zoedepth Kitti
MIT
ZoeDepth is a vision model for monocular depth estimation, fine-tuned on the KITTI dataset, capable of achieving zero-shot transfer for metric depth estimation.
3D Vision Transformers
Z
Intel
7,037
2
Shap E
MIT
Shap-E is a diffusion-based text-to-3D generation model capable of producing 3D assets renderable as textured meshes and neural radiance fields from text prompts.
3D Vision
S
openai
6,109
234
Theia Base Patch16 224 Cddsv
Other
Theia is a vision foundation model for robot learning, enriched with visual representation capabilities through the distillation of multiple vision foundation models
3D Vision Transformers
T
theaiinstitute
5,404
2
Triposg
MIT
TripoSG is a high-fidelity 3D shape synthesis foundational model based on large-scale rectified flow, capable of generating high-quality 3D meshes from a single image.
3D Vision
T
VAST-AI
5,402
101
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase