AI RESEARCH
Fundamental Limits of Neural Network Sparsification: Evidence from Catastrophic Interpretability Collapse
arXiv CS.LG
•
ArXi:2603.18056v1 Announce Type: new Extreme neural network sparsification (90% activation reduction) presents a critical challenge for mechanistic interpretability: understanding whether interpretable features survive aggressive compression. This work investigates feature survival under severe capacity constraints in hybrid Variational Autoencoder--Sparse Autoencoder (VAE-SAE) architectures. We