From Mechanistic to Compositional Interpretability

ArXi:2605.08934v1 Announce Type: new Mechanistic interpretability aims to explain neural model behaviour by reverse-engineering learned computational structure into human-understandable components. Without a formal framework, however, mechanistic explanations cannot be objectively verified, compared, or composed. We