The Implicit Curriculum: Learning Dynamics in RL with Verifiable Rewards

ArXi:2602.14872v2 Announce Type: replace-cross Reinforcement learning with verifiable rewards (RLVR) has been a main driver of recent breakthroughs in large reasoning models. Yet it remains a mystery how rewards based solely on final outcomes can help overcome the long-horizon barrier to extended reasoning. To understand this, we develop a theory of the