Invisible to Humans, Triggered by Agents: Stealthy Jailbreak Attacks on Mobile Vision-Language Agents

ArXi:2510.07809v4 Announce Type: replace-cross Large Vision-Language Models (LVLMs) empower autonomous mobile agents, yet their security under realistic mobile deployment constraints remains underexplored. While agents are vulnerable to visual prompt injections, stealthily executing such attacks without requiring system-level privileges remains challenging, as existing methods rely on persistent visual manipulations that are noticeable to users. We uncover a consistent discrepancy between human and agent interactions: automated agents generate near-zero contact touch signals.