Mechanistic Interpretability of ASR models using Sparse Autoencoders

ArXi:2605.12225v1 Announce Type: new Understanding the internal machinations of deep Transformer-based NLP models is crucial than ever as these models see widespread use in various domains that affect the public at large, such as industry, academia, finance, health. While these models have advanced rapidly, their internal mechanisms remain largely a mystery. Techniques such as Sparse Autoencoders (SAE) have emerged to understand these mechanisms by projecting dense representations into a sparse vector.