Training for Trustworthy Saliency Maps: Adversarial Training Meets Feature-Map Smoothing

ArXi:2603.07302v1 Announce Type: new Gradient-based saliency methods such as Vanilla Gradient (VG) and Integrated Gradients (IG) are widely used to explain image classifiers, yet the resulting maps are often noisy and unstable, limiting their usefulness in high-stakes settings. Most prior work improves explanations by modifying the attribution algorithm, leaving open how the