The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought

ArXi:2605.18079v1 Announce Type: new Existing expressivity results for transformers typically rely on hardmax attention, high precision, and other architectural modifications that disconnect them from the models used in practice. We bridge this gap by analyzing standard transformer decoders with softmax attention and rounding of activations and attention weights, while allowing depth and width to grow logarithmically with the context length.