AI RESEARCH

MedSPOT: A Workflow-Aware Sequential Grounding Benchmark for Clinical GUI

arXiv CS.CV

ArXi:2603.19993v1 Announce Type: new Despite the rapid progress of Multimodal Large Language Models (MLLMs), their ability to perform reliable visual grounding in high-stakes clinical software environments remains underexplored. Existing GUI benchmarks largely focus on isolated, single-step grounding queries, overlooking the sequential, workflow-driven reasoning required in real-world medical interfaces, where tasks evolve across independent steps and dynamic interface states. We