AI RESEARCH
A Comedy of Estimators: On KL Regularization in RL Training of LLMs
arXiv CS.LG
•
ArXi:2512.21852v3 Announce Type: replace The reasoning performance of large language models (LLMs) can be substantially improved by