Defense effectiveness across architectural layers: a mechanistic evaluation of persistent memory attacks on stateful LLM agents

ArXi:2605.08442v1 Announce Type: cross Persistent memory attacks against LLM agents achieve high attack success rates against open-source models. In these attacks, malicious instructions injected via RAG-retrieved documents are d in persistent memory and executed in later sessions. However, no systematic evaluation of defense effectiveness against this attack class exists. We evaluate six defenses across four architectural layers against delayed-trigger attacks on nine open-source models (5,040 runs, N=40 per condition