When More Thinking Hurts: Overthinking in LLM Test-Time Compute Scaling

ArXi:2604.10739v1 Announce Type: new Scaling test-time compute through extended chains of thought has become a dominant paradigm for improving large language model reasoning. However, existing research implicitly assumes that longer thinking always yields better results. This assumption remains largely unexamined. We systematically investigate how the marginal utility of additional reasoning tokens changes as compute budgets increase.