AI RESEARCH
ICE: Intervention-Consistent Explanation Evaluation with Statistical Grounding for LLMs
arXiv CS.AI
•
ArXi:2603.18579v1 Announce Type: cross Evaluating whether explanations faithfully reflect a model's reasoning remains an open problem. Existing benchmarks use single interventions without statistical testing, making it impossible to distinguish genuine faithfulness from chance-level performance. We