Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models

ArXi:2605.17672v1 Announce Type: new Large Reasoning Models (LRMs) achieve strong performance by generating long chains of thought (CoT), but often overthink, continuing to reason after a solution has already stabilized and thereby wasting tokens and increasing latency. Existing inference-time early-exit methods rely primarily on answer-level signals, such as confidence or trial-answer consistency, to decide when to stop.