AI RESEARCH
Future Policy Approximation for Offline Reinforcement Learning Improves Mathematical Reasoning
arXiv CS.CL
•
ArXi:2509.19893v2 Announce Type: replace Reinforcement Learning (RL) has emerged as the key driver for post-