AI RESEARCH

Neighbor GRPO: Contrastive ODE Policy Optimization Aligns Flow Models

arXiv CS.LG

ArXi:2511.16955v2 Announce Type: replace-cross Group Relative Policy Optimization (GRPO) has shown promise in aligning image and video generative models with human preferences. However, applying it to modern flow matching models is challenging because of its deterministic sampling paradigm. Current methods address this issue by converting Ordinary Differential Equations (ODEs) to Stochastic Differential Equations (SDEs), which