AI RESEARCH
It Takes Two: Your GRPO Is Secretly DPO
arXiv CS.LG
•
ArXi:2510.00977v3 Announce Type: replace GRPO has emerged as a prominent reinforcement learning algorithm for post-