Attention Hijacking: Response Manipulation Across Queries in Vision-Language Models

ArXi:2605.17310v1 Announce Type: new Existing adversarial attacks on vision-language models (VLMs) can steer model outputs toward attacker-specified target responses, but their effectiveness often degrades when the same perturbed input is paired with different textual queries. This paper studies cross-query response manipulation, where a single adversarial example is expected to remain effective across diverse user queries.