Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairs

ArXi:2605.09433v1 Announce Type: new Existing preference datasets for text-to-image models typically only the final winner/loser images. This representation is insufficient for rectified flow (RF) models, whose generation is naturally indexed by a specific prior noise sample and follows a nearly straight denoising trajectory. In contrast, prior DPO-style alignment for diffusion models commonly estimates trajectories using an independent forward noising process, which can be mismatched to the true reverse dynamics and