Sparsity as a Key: Unlocking New Insights from Latent Structures for Out-of-Distribution Detection

ArXi:2604.26409v1 Announce Type: new Sparse Autoencoders (SAEs) have nstrated significant success in interpreting Large Language Models (LLMs) by decomposing dense representations into sparse, semantic components. However, their potential for analyzing Vision Transformers (ViTs) remains largely under-explored. In this work, we present the first application of SAEs to the ViT [CLS] token for out-of-distribution (OOD) detection, addressing the limitation of existing methods that rely on entangled feature representations.