SARL: Label-Free Reinforcement Learning by Rewarding Reasoning Topology

ArXi:2603.27977v1 Announce Type: new Reinforcement learning has become central to improving large reasoning models, but its success still relies heavily on verifiable rewards or labeled supervision. This limits its applicability to open ended domains where correctness is ambiguous and cannot be verified. Moreover, reasoning trajectories remain largely unconstrained, and optimization towards final answer can favor early exploitation over generalization.