DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains

ArXi:2510.27419v2 Announce Type: replace Large Reasoning Models (LRMs) have nstrated impressive capabilities but suffer from cognitive inefficiencies like "overthinking" simple problems and "underthinking" complex ones. While existing methods that use supervised fine-tuning (SFT) or reinforcement learning (RL) with token-length rewards can improve efficiency, they often do so at the cost of accuracy. This paper