Diffusion Reinforcement Learning via Centered Reward Distillation

ArXi:2603.14128v1 Announce Type: cross Diffusion and flow models achieve State-Of-The-Art (SOTA) generative performance, yet many practically important behaviors such as fine-grained prompt fidelity, compositional correctness, and text rendering are weakly specified by score or flow matching pre