DRS-GUI: Dynamic Region Search for Training-Free GUI Grounding

ArXi:2605.15542v1 Announce Type: new GUI agents powered by Multimodal Large Language Models (MLLMs) have nstrated impressive capability in understanding and executing user instructions. However, accurately grounding instruction-relevant elements from high-resolution screenshots cluttered with irrelevant UI components remains challenging for existing approaches. Inspired by how humans dynamically adjust their perceptual scope to locate task-related regions on complex screens, we propose DRS-GUI, a