Do Sparse Autoencoders Capture Concept Manifolds?

ArXi:2604.28119v1 Announce Type: cross Sparse autoencoders (SAEs) are widely used to extract interpretable features from neural network representations, often under the implicit assumption that concepts correspond to independent linear directions. However, a growing body of evidence suggests that many concepts are instead organized along low-dimensional manifolds encoding continuous geometric relationships.