CARINOX: Inference-time Scaling with Category-Aware Reward-based Initial Noise Optimization and Exploration

ArXi:2509.17458v3 Announce Type: replace-cross Text-to-image diffusion models, such as Stable Diffusion, can produce high-quality and diverse images but often fail to achieve compositional alignment, particularly when prompts describe complex object relationships, attributes, or spatial arrangements. Recent inference-time approaches address this by optimizing or exploring the initial noise under the guidance of reward functions that score text-image alignment without requiring model fine-tuning.