Causal Dimensionality of Transformer Representations: Measurement, Scaling, and Layer Structure

ArXi:2605.08740v1 Announce Type: cross Sparse autoencoders (SAEs) decompose transformer residual streams into interpretable feature dictionaries, yet the relationship between SAE width and causal influence on model output has not been systematically characterised. We