SaliencyDecor: Enhancing Neural Network Interpretability through Feature Decorrelation

ArXi:2604.25315v1 Announce Type: new Gradient-based saliency methods are widely used to interpret deep neural networks, yet they often produce noisy and unstable explanations that poorly align with semantically meaningful input features. We argue that a fundamental cause of this behavior lies in the geometry of learned representations: correlated feature dimensions diffuse attribution gradients across redundant directions, resulting in blurred and unreliable saliency maps.