AI RESEARCH

When Alignment Isn't Enough: Response-Path Attacks on LLM Agents

arXiv CS.AI

ArXi:2605.02187v1 Announce Type: cross Bring-Your-Own-Key (BYOK) agent architectures let users route LLM traffic through third-party relays, creating a critical integrity gap: a malicious relay can modify an aligned LLM response after generation but before agent execution. We formalize this post-alignment tampering threat and show that, without end-to-end integrity, the relay can observe, suppress, or replace downstream messages, making even perfectly aligned LLMs ineffective against such attacks.