Causal Interpretation of Neural Network Computations with Contribution Decomposition

ArXi:2603.06557v1 Announce Type: new Understanding how neural networks transform inputs into outputs is crucial for interpreting and manipulating their behavior. Most existing approaches analyze internal representations by identifying hidden-layer activation patterns correlated with human-interpretable concepts. Here we take a direct approach to examine how hidden neurons act to drive network outputs. We