AI RESEARCH

Stepwise Credit Assignment for GRPO on Flow-Matching Models

arXiv CS.AI

ArXi:2603.28718v1 Announce Type: cross Flow-GRPO successfully applies reinforcement learning to flow models, but uses uniform credit assignment across all steps. This ignores the temporal structure of diffusion generation: early steps determine composition and content (low-frequency structure), while late steps resolve details and textures (high-frequency details). Moreover, assigning uniform credit based solely on the final image can inadvertently reward suboptimal intermediate steps, especially when errors are corrected later in the diffusion trajectory.