MARBLE: Multi-Aspect Reward Balance for Diffusion RL

ArXi:2605.06507v1 Announce Type: cross Reinforcement learning fine-tuning has become the dominant approach for aligning diffusion models with human preferences. However, assessing images is intrinsically a multi-dimensional task, and multiple evaluation criteria need to be optimized simultaneously. Existing practice deal with multiple rewards by