AI RESEARCH

Multi-modal user interface control detection using cross-attention

arXiv CS.CV

ArXi:2604.06934v1 Announce Type: new Detecting user interface (UI) controls from software screenshots is a critical task for automated testing, accessibility, and software analytics, yet it remains challenging due to visual ambiguities, design variability, and the lack of contextual cues in pixel-only approaches. In this paper, we