Shorten After You're Right: Lazy Length Penalties for Reasoning RL

ArXi:2505.12284v3 Announce Type: replace Large reasoning models, such as OpenAI o1 or DeepSeek R1, have nstrated remarkable performance on reasoning tasks but often incur a long reasoning path with significant memory and time costs. Existing methods primarily aim to shorten reasoning paths by