AI RESEARCH
When Importance Sampling Misallocates Credit: Asymmetric Ratios for Outcome-Supervised RL
arXiv CS.CL
•
ArXi:2510.06062v2 Announce Type: replace Reinforcement learning (RL) has shown great promise in large language models (LLMs) post-