Anti-Length Shift: Dynamic Outlier Truncation for Training Efficient Reasoning Models

ArXi:2601.03969v2 Announce Type: replace-cross Large reasoning models enhanced by reinforcement learning with verifiable rewards have achieved significant performance gains by extending their chain-of-thought. However, this paradigm incurs substantial deployment costs as models often exhibit excessive verbosity on simple queries. Existing efficient reasoning methods relying on explicit length penalties often