AI RESEARCH
Stabilizing Reinforcement Learning for Diffusion Language Models
arXiv CS.LG
•
ArXi:2603.06743v1 Announce Type: new Group Relative Policy Optimization (GRPO) is highly effective for post-