LaSM: Layer-wise Scaling Mechanism for Defending Pop-up Attack on GUI Agents

ArXi:2507.10610v2 Announce Type: replace-cross Graphical user interface (GUI) agents built on multimodal large language models (MLLMs) have recently nstrated strong decision-making abilities in screen-based interaction tasks. However, they remain highly vulnerable to pop-up-based environmental injection attacks, where malicious visual elements divert model attention and lead to unsafe or incorrect actions. Existing defense methods either require costly re