Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

ArXi:2605.06638v1 Announce Type: cross Reinforcement learning (RL) has been applied to improve large language model (LLM) reasoning, yet the systematic study of how