AI RESEARCH

Stabilizing Reinforcement Learning for Diffusion Language Models

arXiv CS.LG

ArXi:2603.06743v1 Announce Type: new Group Relative Policy Optimization (GRPO) is highly effective for post-