Non-identifiability of Explanations from Model Behavior in Deep Networks of Image Authenticity Judgments

ArXi:2604.07254v1 Announce Type: cross Deep neural networks can predict human judgments, but this does not imply that they rely on human-like information or reveal the cues underlying those judgments. Prior work has addressed this issue using attribution heatmaps, but their explanatory value in itself depends on robustness. Here we tested the robustness of such explanations by evaluating whether models that predict human authenticity ratings also produce consistent explanations within and across architectures.