3 Classifiers, 3 Answers: Why CoT Faithfulness Scores Are Meaningless

3 Classifiers, 3 Answers: Why CoT Faithfulness Scores Are Meaningless LLM Chain-of-Thought (CoT) - the mechanism where models output their reasoning process as text before answering - has been treated as a window into model thinking. The question of whether CoT actually reflects internal reasoning (faithfulness) has attracted serious research. Numbers like "DeepSeek-R1 acknowledges hints 39% of the time" circulate as if they're objective measurements. But can you trust those numbers? A March 2026 ArXi paper (Young, 2026) lished this assumption.