reloc3r-512 Open-source Camera Pose Estimation Model - Concisely, Efficiently, and Precisely Estimate Camera Pose

Reloc3r 512

Developed by siyan824

Reloc3r is a concise and efficient camera pose estimation framework that combines a pretrained dual-view relative camera pose regression network with a multi-view motion averaging module.

Pose Estimation

Safetensors

#Dual-view Pose Regression #Multi-view Motion Averaging #Real-time Visual Localization

Downloads 840

Release Time : 1/6/2025

Model Overview

Reloc3r is a deep learning model for camera pose estimation, capable of achieving universal, fast, and accurate visual localization. Through large-scale training (approximately 8 million pose-annotated image pairs), it demonstrates remarkable performance and generalization ability, enabling real-time generation of high-quality camera pose estimates.

Model Features

Efficient and Real-time

Achieves 40 FPS inference speed on RTX 4090, supporting real-time camera pose estimation.

Large-scale Training

Trained on approximately 8 million pose-annotated image pairs, exhibiting outstanding generalization capability.

Multi-view Support

Combines dual-view relative pose regression and multi-view motion averaging modules to enhance pose estimation accuracy.

Wild Applicability

Performs excellently on self-captured images/videos, suitable for various real-world scenarios.

Model Capabilities

Relative Camera Pose Estimation

Absolute Camera Pose Estimation

Visual Localization

Image Pair Pose Regression

Video Frame Pose Estimation

Use Cases

Augmented Reality

AR Scene Localization

Quickly and accurately determines device position and orientation in augmented reality applications

Real-time generation of high-quality camera pose estimates

Robotic Navigation

Autonomous Robot Localization

Helps robots determine their position in unknown environments

High-precision visual localization capability

3D Reconstruction

Multi-view 3D Reconstruction

Provides accurate camera pose information for 3D reconstruction

Improves reconstruction quality and accuracy

tags:

camera pose estimation pipeline_tag: image-to-3d library_name: pytorch

[CVPR 2025] Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization

Paper: https://huggingface.co/papers/2412.08376

Code: https://github.com/ffrivera0/reloc3r

Reloc3r is a simple yet effective camera pose estimation framework that combines a pre-trained two-view relative camera pose regression network with a multi-view motion averaging module.

Trained on approximately 8 million posed image pairs, Reloc3r achieves surprisingly good performance and generalization ability, producing high-quality camera pose estimates in real-time.

TODO List
Installation
Usage
Evaluation on Relative Camera Pose Estimation
Evaluation on Visual Localization
Training
Citation
Acknowledgments

TODO List

[x] Release pre-trained weights and inference code.
[x] Release evaluation code for ScanNet1500, MegaDepth1500 and Cambridge datasets.
[x] Release sample code for self-captured images and videos.
[x] Release training code and data.
[ ] Release evaluation code for other datasets.
[ ] Release the accelerated version for visual localization.
[ ] Release Gradio Demo.

Installation

Clone Reloc3r

git clone --recursive https://github.com/ffrivera0/reloc3r.git
cd reloc3r
# if you have already cloned reloc3r:
# git submodule update --init --recursive

Create the environment using conda

conda create -n reloc3r python=3.11 cmake=3.14.0
conda activate reloc3r 
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia  # use the correct version of cuda for your system
pip install -r requirements.txt
# optional: you can also install additional packages to:
# - add support for HEIC images
pip install -r requirements_optional.txt

Optional: Compile the cuda kernels for RoPE

# Reloc3r relies on RoPE positional embeddings for which you can compile some cuda kernels for faster runtime.
cd croco/models/curope/
python setup.py build_ext --inplace
cd ../../../

Optional: Download the checkpoints Reloc3r-224/Reloc3r-512. The pre-trained model weights will automatically download when running the evaluation and demo code below.

Usage

Using Reloc3r, you can estimate camera poses for images and videos you captured.

For relative pose estimation, try the demo code in wild_relpose.py. We provide some image pairs used in our paper.

# replace the args with your paths
python wild_relpose.py --v1_path ./data/wild_images/zurich0.jpg --v2_path ./data/wild_images/zurich1.jpg --output_folder ./data/wild_images/

Visualize the relative pose

# replace the args with your paths
python visualization.py --mode relpose --pose_path ./data/wild_images/pose2to1.txt

For visual localization, the demo code in wild_visloc.py estimates absolute camera poses from sampled frames in self-captured videos.

[!IMPORTANT] The demo simply uses the first and last frames as the database, which requires overlapping regions among all images. This demo does not support linear motion. We provide some videos as examples.

# replace the args with your paths
python wild_visloc.py --video_path ./data/wild_video/ids.MOV --output_folder ./data/wild_video

Visualize the absolute poses

# replace the args with your paths
python visualization.py --mode visloc --pose_folder ./data/wild_video/ids_poses/

Evaluation on Relative Camera Pose Estimation

To reproduce our evaluation on ScanNet1500 and MegaDepth1500, download the datasets here and unzip it to ./data/. Then run the following script. You will obtain results similar to those presented in our paper.

bash scripts/eval_relpose.sh

[!NOTE] To achieve faster inference speed, set --amp=1. This enables evaluation with fp16, which increases speed from 24 FPS to 40 FPS on an RTX 4090 with Reloc3r-512, without any accuracy loss.

Evaluation on Visual Localization

To reproduce our evaluation on Cambridge, download the dataset here and unzip it to ./data/cambridge/. Then run the following script. You will obtain results similar to those presented in our paper.

bash scripts/eval_visloc.sh

Training

We follow DUSt3R to process the training data. Download the datasets: CO3Dv2, ScanNet++, ARKitScenes, BlendedMVS, MegaDepth, DL3DV, RealEstate10K.

For each dataset, we provide a preprocessing script in the datasets_preprocess directory and an archive containing the list of pairs when needed. You have to download the datasets yourself from their official sources, agree to their license, and run the preprocessing script.

We provide a sample script to train Reloc3r with ScanNet++ on an RTX 3090 GPU

bash scripts/train_small.sh

To reproduce our training for Reloc3r-512 with 8 H800 GPUs, run the following script

bash scripts/train.sh

[!NOTE] They are not strictly equivalent to what was used to train Reloc3r, but they should be close enough.

Citation

If you find our work helpful in your research, please consider citing:

@article{reloc3r,
  title={Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization},
  author={Dong, Siyan and Wang, Shuzhe and Liu, Shaohui and Cai, Lulu and Fan, Qingnan and Kannala, Juho and Yang, Yanchao},
  journal={arXiv preprint arXiv:2412.08376},
  year={2024}
}