LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories

ArXi:2604.15311v1 Announce Type: new This paper focuses on the alignment of flow matching models with human preferences. A promising way is fine-tuning by directly backpropagating reward gradients through the differentiable generation process of flow matching. However, backpropagating through long trajectories results in prohibitive memory costs and gradient explosion. Therefore, direct-gradient methods struggle to update early generation steps, which are crucial for determining the global structure of the final image. To address this issue, we