Lost in Interpretation: The Plausibility-Faithfulness Trade-off in Cross-Lingual Explanations

ArXi:2605.19274v1 Announce Type: new LLMs deployed multilingually are often audited via English explanations for non-English inputs. We evaluate extractive explanations ''where the model identifies input token spans as evidence alongside a generated rationale'' and uncover a systematic trade-off: English-pivot explanations can achieve higher span agreement with human rationales while their evidence becomes less causally grounded in the model's prediction, as measured by both comprehensiveness and sufficiency.