AI RESEARCH
Zoom Consistency: A Free Confidence Signal in Multi-Step Visual Grounding Pipelines
arXiv CS.AI
•
ArXi:2604.15376v1 Announce Type: cross Multi-step zoom-in pipelines are widely used for GUI grounding, yet the intermediate predictions they produce are typically discarded after coordinate remapping. We observe that these intermediate outputs contain a useful confidence signal for free: zoom consistency, the distance between a model's step-2 prediction and the crop center. Unlike log-probabilities or token-level uncertainty, zoom consistency is a geometric quantity in a shared coordinate space, making it directly comparable across architecturally different VLMs without calibration.