AI RESEARCH

Future Policy Approximation for Offline Reinforcement Learning Improves Mathematical Reasoning

arXiv CS.CL

ArXi:2509.19893v2 Announce Type: replace Reinforcement Learning (RL) has emerged as the key driver for post-