AI RESEARCH
Entropy After </Think> for reasoning model early exiting
arXiv CS.LG
•
ArXi:2509.26522v3 Announce Type: replace Reasoning LLMs show improved performance with longer chains of thought. However, recent work has highlighted their tendency to overthink, continuing to revise answers even after reaching the correct solution. We quantitatively confirm this inefficiency from the distribution dynamics perspective by tracking Pass for answers averaged over a large number of rollouts and find the model often begins to always produce the correct answer early in the reasoning, making extra reasoning tokens wasteful.