AI RESEARCH

When Importance Sampling Misallocates Credit: Asymmetric Ratios for Outcome-Supervised RL

arXiv CS.CL

ArXi:2510.06062v2 Announce Type: replace Reinforcement learning (RL) has shown great promise in large language models (LLMs) post-