AI RESEARCH

The Detection--Extraction Gap: Models Know the Answer Before They Can Say It

arXiv CS.LG

ArXi:2604.06613v1 Announce Type: cross Modern reasoning models continue generating long after the answer is already determined. Across five model configurations, two families, and three benchmarks, we find that \textbf{52--88\% of chain-of-thought tokens are produced after the answer is recoverable} from a partial prefix. This post-commitment generation reveals a structural phenomenon: the \textbf{detection--extraction gap}. Free continuations from early prefixes recover the correct answer even at 10\% of the trace, while forced extraction fails on 42\% of these cases.