Pi3
π³ is a scalable permutation-equivariant visual geometric learning model that revolutionizes visual geometric reconstruction methods.
Downloads 229
Release Time : 7/14/2025
Model Overview
By eliminating the need for a fixed reference view and adopting a fully permutation-equivariant architecture, π³ can directly predict affine-invariant camera poses and scale-invariant local point maps from an unordered set of images. It is robust to the input order and highly scalable.
Model Features
Permutation equivariance
Adopting a fully permutation-equivariant architecture, it is robust to the input order and does not require a fixed reference view.
Highly scalable
The model is simply designed and unbiased, capable of handling large-scale unordered image sets.
Affine invariance
It can directly predict affine-invariant camera poses and scale-invariant local point maps.
Model Capabilities
Camera pose estimation
Monocular depth estimation
Video depth estimation
Dense point cloud estimation
Use Cases
3D reconstruction
Reconstruct 3D scenes from videos
Use video frames as input to reconstruct 3D point cloud scenes.
Achieve state-of-the-art reconstruction performance
Reconstruct from unordered image sets
Reconstruct 3D scenes from an unordered set of images without a fixed reference view.
Robust to the input order
Featured Recommended AI Models
Qwen2.5 VL 7B Abliterated Caption It I1 GGUF
Apache-2.0
Quantized version of Qwen2.5-VL-7B-Abliterated-Caption-it, supporting multilingual image description tasks.
Image-to-Text
Transformers Supports Multiple Languages

Q
mradermacher
167
1
Nunchaku Flux.1 Dev Colossus
Other
The Nunchaku quantized version of the Colossus Project Flux, designed to generate high-quality images based on text prompts. This model minimizes performance loss while optimizing inference efficiency.
Image Generation English
N
nunchaku-tech
235
3
Qwen2.5 VL 7B Abliterated Caption It GGUF
Apache-2.0
This is a static quantized version based on the Qwen2.5-VL-7B model, focusing on image captioning generation tasks and supporting multiple languages.
Image-to-Text
Transformers Supports Multiple Languages

Q
mradermacher
133
1
Olmocr 7B 0725 FP8
Apache-2.0
olmOCR-7B-0725-FP8 is a document OCR model based on the Qwen2.5-VL-7B-Instruct model. It is fine-tuned using the olmOCR-mix-0225 dataset and then quantized to the FP8 version.
Image-to-Text
Transformers English

O
allenai
881
3
Lucy 128k GGUF
Apache-2.0
Lucy-128k is a model developed based on Qwen3-1.7B, focusing on proxy-based web search and lightweight browsing, and can run efficiently on mobile devices.
Large Language Model
Transformers English

L
Mungert
263
2