AI RESEARCH

ExpThink: Experience-Guided Reinforcement Learning for Adaptive Chain-of-Thought Compression

arXiv CS.LG

ArXi:2605.07501v1 Announce Type: new Large reasoning models (LRMs) achieve strong performance via extended chain-of-thought (CoT) reasoning, yet suffer from excessive token consumption and high inference latency. Existing reinforcement learning (RL) approaches for CoT compression rely on uniform, static length penalties that neglect model capability dynamics and problem-level difficulty variation. We propose \textbf{ExpThink}\xspace, an RL framework that addresses both dimensions through two complementary mechanisms.