AI RESEARCH
The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?
arXiv CS.AI
•
ArXi:2604.06436v2 Announce Type: cross We prove that no continuous, utility-preserving wrapper defense-a function $D: X\to X$ that preprocesses inputs before the model sees them-can make all outputs strictly safe for a language model with connected prompt space, and we characterize exactly where every such defense must fail.