NVComposer Open-Source Model - Achieve Free Generation of High-Quality 3D Views Without External Alignment

Nvcomposer

Developed by TencentARC

NVComposer is a generative multi-view novel view synthesis model that requires no explicit external alignment, achieving high-quality 3D view generation through image-pose dual-stream diffusion and geometry-aware feature alignment.

3D Vision EnglishOpen Source License:Other #Image-to-3D Generation #Pose-Free Novel View Synthesis #Geometry-Aware Feature Alignment

Downloads 93

Release Time : 12/6/2024

Model Overview

This model significantly improves the quality and flexibility of multi-view novel view synthesis by synchronously generating target novel views with conditional camera poses and incorporating a geometry-aware feature alignment module.

Model Features

No External Alignment Dependency

Synchronously generates images and camera poses through a dual-stream diffusion model, eliminating the need for explicit pose estimation or pre-reconstruction processes.

Geometry-Aware Feature Alignment

Utilizes pre-trained dense stereo models to extract geometric priors, enhancing feature alignment effectiveness.

Multi-View Compatibility

Maintains stable performance even with insufficient view overlap or occlusion scenarios.

Model Capabilities

Single-image 3D view generation

Multi-view image synthesis

Camera pose estimation

Geometric feature extraction

Use Cases

3D Content Creation

Virtual Scene Construction

Generates complete 3D scenes from single or multiple images

Produces high-quality, multi-view consistent 3D views

Augmented Reality Applications

Rapidly generates 3D object views for AR applications

Real-time novel view generation capability meets AR scenario requirements

Film Special Effects

View Expansion

Generates additional views based on limited shooting materials

Reduces actual shooting workload while maintaining visual consistency

🚀 NVComposer

NVComposer is a novel approach for generative multi - view novel view synthesis (NVS) that eliminates the need for explicit external alignment, improving flexibility and accessibility.

🚀 Quick Start

NVComposer is a groundbreaking solution in the field of generative multi - view novel view synthesis (NVS). It overcomes the limitations of existing methods by removing the requirement for explicit external alignment.

✨ Features

Eliminates External Alignment: NVComposer enables the generative model to implicitly infer spatial and geometric relationships between multiple conditional views, getting rid of the need for external multi - view alignment processes like explicit pose estimation or pre - reconstruction.
Image - Pose Dual - Stream Diffusion Model: Simultaneously generates target novel views and condition camera poses.
Geometry - Aware Feature Alignment Module: Distills geometric priors from dense stereo models during training.

📚 Documentation

Abstract

Recent advancements in generative models have significantly improved novel view synthesis (NVS) from multi - view data. However, existing methods depend on external multi - view alignment processes, such as explicit pose estimation or pre - reconstruction, which limits their flexibility and accessibility, especially when alignment is unstable due to insufficient overlap or occlusions between views. In this paper, we propose NVComposer, a novel approach that eliminates the need for explicit external alignment. NVComposer enables the generative model to implicitly infer spatial and geometric relationships between multiple conditional views by introducing two key components: 1) an image - pose dual - stream diffusion model that simultaneously generates target novel views and condition camera poses, and 2) a geometry - aware feature alignment module that distills geometric priors from dense stereo models during training. Extensive experiments demonstrate that NVComposer achieves state - of - the - art performance in generative multi - view NVS tasks, removing the reliance on external alignment and thus improving model accessibility. Our approach shows substantial improvements in synthesis quality as the number of unposed input views increases, highlighting its potential for more flexible and accessible generative NVS systems.

Method

NVComposer contains 1) an image - pose dual - stream diffusion model that generates novel views while implicitly estimating camera poses for conditional images, and 2) a geometry - aware feature alignment adapter that uses geometric priors distilled from pretrained dense stereo models.

📦 Installation

Download the model checkpoint using huggingface_hub (Version 0.1 as example):

from huggingface_hub import hf_hub_download

checkpoint_path = hf_hub_download(
    repo_id="TencentARC/NVComposer",
    filename="NVComposer-V0.1.ckpt"
)

The downloaded checkpoint file can be found at checkpoint_path.