The Right Answer, the Wrong Direction: Why Transformers Fail at Counting and How to Fix It

ArXi:2605.03258v1 Announce Type: new Large language models often fail at simple counting tasks, even when the items to count are explicitly present in the prompt. We investigate whether this failure occurs because transformers do not represent counts internally, or because they cannot convert those representations into the correct output tokens. Across three model families, Pythia, Qwen3, and Mistral, ranging from 0.4B to 14B parameters, we find strong evidence for the second explanation.