AI RESEARCH

It Takes Two: Your GRPO Is Secretly DPO

arXiv CS.LG • May 15, 2026

ArXi:2510.00977v3 Announce Type: replace GRPO has emerged as a prominent reinforcement learning algorithm for post-