IAG: Input-aware Backdoor Attack on VLM-based Visual Grounding

ArXi:2508.09456v4 Announce Type: replace-cross Recent advances in vision-language models (VLMs) have significantly enhanced the visual grounding task, which involves locating objects in an image based on natural language queries. Despite these advancements, the security of VLM-based grounding systems has not been thoroughly investigated. This paper reveals a novel and realistic vulnerability: the first multi-target backdoor attack on VLM-based visual grounding.