Refining Few-Step Text-to-Multiview Diffusion via Reinforcement Learning

ArXi:2505.20107v2 Announce Type: replace Text-to-multiview (T2MV) diffusion models have shown great promise in generating multiple views of a scene from a single text prompt. While few-step backbones enable real-time T2MV generation, they often compromise key aspects of generation quality, such as per-view fidelity and cross-view consistency.