ConsistentDreamer: View-Consistent Meshes Through Balanced Multi-View Gaussian Optimization

1Intelligent Cloud Technologies Laboratory, Huawei Munich Research Center
2School of Computation, Information and Technology, TUM
Published in Pattern Recognition Letters 190 (2025) 118-125
teaser

Unlike prior optimization methods using only view-conditioned diffusion guidance or multi-view reconstruction, we utilize both for balanced optimization of rough shape and fine details, improving view consistency in content and visual quality.

Abstract

Recent advances in diffusion models have significantly improved 3D generation, enabling the use of assets generated from an image for embodied AI simulations. However, the one-to-many nature of the image-to-3D problem limits their use due to inconsistent content and quality across views. Previous models optimize a 3D model by sampling views from a view-conditioned diffusion prior, but diffusion models cannot guarantee view consistency. Instead, we present ConsistentDreamer, where we first generate a set of fixed multi-view prior images and sample random views between them with another diffusion model through a score distillation sampling (SDS) loss. Thereby, we limit the discrepancies between the views guided by the SDS loss and ensure a consistent rough shape. In each iteration, we also use our generated multi-view prior images for fine-detail reconstruction. To balance between the rough shape and the fine-detail optimizations, we introduce dynamic task-dependent weights based on homoscedastic uncertainty, updated automatically in each iteration. Additionally, we employ opacity, depth distortion, and normal alignment losses to refine the surface for mesh extraction. Our method ensures better view consistency and visual quality compared to the state-of-the-art.

teaser

ConsistentDreamer pipeline. ConsistentDreamer is a Gaussian-based method for view-consistent 3D generation from a single image, guided by consistent multi-view images generated in a prior stage. Rough shape is optimized by improving random views with a diffusion conditioned on the closest prior view, while fine details are refined by comparing all prior views to corresponding views of the representation. A balance between rough and fine optimizations is found with dynamic weights updated based on the final loss, with mesh extraction ensured through depth distortion, normal alignment, and opacity losses.

teaser

The effect of loss balancing. Our rough shape and fine details optimizations are unstable, with fluctuations in the corresponding losses leading to unnecessary densification of the Gaussian representation. Despite both tasks using photometric error, they require different optimization schedules due to the scale differences in their losses, which arise from operating in different spaces (embedded space and image space). To automatically optimize the schedules for both loss components, we introduce dynamic loss weights based on homoscedastic uncertainty, a task-dependent uncertainty that remains constant across input. To show the effect of this balancing, we analyze the rough and fine losses throughout optimization without and with our proposed loss balancing in (a) and (b), respectively. (c) shows loss weights values over the course of the optimization in (b). The automatic weight updates lead to more stable optimization, toning down the rough shape loss by increasing its weight at a higher rate while also stabilizing local peaks in the fine detail optimization caused by Gaussian densification.

Qualitative Results on the Google Scanned Objects Dataset

Qualitative Results on Images from the Internet

BibTeX

@misc{şahin2025consistentdreamerviewconsistentmeshesbalanced,
        title={ConsistentDreamer: View-Consistent Meshes Through Balanced Multi-View Gaussian Optimization}, 
        author={Onat Şahin and Mohammad Altillawi and George Eskandar and Carlos Carbone and Ziyuan Liu},
        year={2025},
        eprint={2502.09278},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://arxiv.org/abs/2502.09278}, 
  }