AI RESEARCH

When AI Persuades: Adversarial Explanation Attacks on Human Trust in AI-Assisted Decision Making

arXiv CS.AI

ArXi:2602.04003v3 Announce Type: replace Most adversarial threats in artificial intelligence (AI) target the computational behavior of models rather than the humans who rely on them. Yet modern AI systems increasingly operate within human decision loops, where users interpret and act on model recommendations. Large Language Models (LLMs) generate fluent natural-language explanations that shape how users perceive and trust AI outputs, revealing a new attack surface at the cognitive layer: the communication channel between AI and its users. We.