Step-level Denoising-time Diffusion Alignment with Multiple Objectives

ArXi:2604.14379v1 Announce Type: cross Reinforcement learning (RL) has emerged as a powerful tool for aligning diffusion models with human preferences, typically by optimizing a single reward function under a KL regularization constraint. In practice, however, human preferences are inherently pluralistic, and aligned models must balance multiple downstream objectives, such as aesthetic quality and text-image consistency.