AI RESEARCH
ClawSafety: "Safe" LLMs, Unsafe Agents
arXiv CS.AI
•
ArXi:2604.01438v2 Announce Type: replace Personal AI agents like OpenClaw run with elevated privileges on users' local machines, where a single successful prompt injection can leak credentials, redirect financial transactions, or destroy files. This threat goes well beyond conventional text-level jailbreaks, yet existing safety evaluations fall short: most test models in isolated chat settings, rely on synthetic environments, and do not account for how the agent framework itself shapes safety outcomes. We.