Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning

ArXi:2605.06241v1 Announce Type: new Reinforcement learning has become the standard for improving reasoning in large language models, yet evidence increasingly suggests that RL does not teach new strategies; it redistributes probability mass over solutions the base model already contains.