AI RESEARCH

A Comedy of Estimators: On KL Regularization in RL Training of LLMs

arXiv CS.LG

ArXi:2512.21852v3 Announce Type: replace The reasoning performance of large language models (LLMs) can be substantially improved by