SMIXAE: Towards Unsupervised Manifold Discovery in Language Models

ArXi:2605.09224v1 Announce Type: new Sparse autoencoders (SAEs) have been used widely to decompose and interpret neural network activations, especially those of transformer language models. One key issue with SAEs is their inability to directly model multidimensional features. Instead, SAEs may tile such features by a set of independent directions that must be grouped together after the SAE