AI RESEARCH
Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training
arXiv CS.AI
•
ArXi:2605.12483v1 Announce Type: cross