The Last Word Often Wins: A Format Confound in Chain-of-Thought Corruption Studies

ArXi:2605.10799v1 Announce Type: cross Corruption studies, the primary tool for evaluating chain-of-thought (CoT) faithfulness, identify which chain positions are "computationally important" by measuring accuracy when steps are replaced with errors. We identify a systematic confound: for chains with explicit terminal answer statements, the dominant format in standard benchmarks, corruption studies detect where the answer text appears, not where computation occurs.