AI RESEARCH
DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains
arXiv CS.AI
•
ArXi:2510.27419v2 Announce Type: replace Large Reasoning Models (LRMs) have nstrated impressive capabilities but suffer from cognitive inefficiencies like "overthinking" simple problems and "underthinking" complex ones. While existing methods that use supervised fine-tuning (SFT) or reinforcement learning (RL) with token-length rewards can improve efficiency, they often do so at the cost of accuracy. This paper