Causal Intervention Framework for Variational Auto Encoder Mechanistic Interpretability

ArXi:2505.03530v2 Announce Type: replace Mechanistic interpretability of deep learning models has emerged as a crucial research direction for understanding the functioning of neural networks. While significant progress has been made in interpreting discriminative models like transformers, understanding generative models such as Variational Autoencoders (VAEs) remains challenging. This paper