Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation

ArXi:2510.02249v2 Announce Type: replace-cross Large Language Models (LLMs) have nstrated remarkable reasoning abilities on complex problems using long Chain-of-Thought (CoT) reasoning. However, they often suffer from overthinking, meaning generating unnecessarily lengthy reasoning steps for simpler problems. This issue may degrade the efficiency of the models and make them difficult to adapt the reasoning depth to the complexity of problems. To address this, we