AI RESEARCH
No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills
arXiv CS.AI
•
ArXi:2605.13044v1 Announce Type: cross LLM-powered agents can silently delete documents, leak credentials, or transfer funds on a routine user request, not because the agent was attacked, but because the skill it invoked broke its own declared safety rules. We call these specification violations: benign inputs cause a skill to breach the natural-language guardrails in its own specification, typically because the guardrail's semantics are undefined for autonomous execution, or because the implementation silently ignores the documented constraint.