AI RESEARCH
V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think
arXiv CS.LG
•
ArXi:2604.23380v1 Announce Type: new Aligning denoising generative models with human preferences or verifiable rewards remains a key challenge. While policy-gradient online reinforcement learning (RL) offers a principled post-