"Faithful to What?" On the Limits of Fidelity-Based Explanations

ArXi:2506.12176v5 Announce Type: replace In explainable AI, surrogate models are commonly evaluated by their fidelity to a neural network's predictions. Fidelity, however, measures alignment to a learned model rather than alignment to the data-generating signal underlying the task. This work