Sparse but not Simpler: A Multi-Level Interpretability Analysis of Vision Transformers

ArXi:2603.15919v1 Announce Type: new Sparse neural networks are often hypothesized to be interpretable than dense models, motivated by findings that weight sparsity can produce compact circuits in language models. However, it remains unclear whether structural sparsity itself leads to improved semantic interpretability. In this work, we systematically evaluate the relationship between weight sparsity and interpretability in Vision Transformers using DeiT-III B/16 models pruned with Wanda. To assess interpretability comprehensively, we