AI RESEARCH

The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?

arXiv CS.AI

ArXi:2604.06436v2 Announce Type: cross We prove that no continuous, utility-preserving wrapper defense-a function $D: X\to X$ that preprocesses inputs before the model sees them-can make all outputs strictly safe for a language model with connected prompt space, and we characterize exactly where every such defense must fail.