FineState-Bench: Benchmarking State-Conditioned Grounding for Fine-grained GUI State Setting

ArXi:2604.27974v1 Announce Type: new Despite the rapid progress of large vision-language models (LVLMs), fine-grained, state-conditioned GUI interaction remains challenging. Current evaluations offer limited coverage, imprecise target-state definitions, and an overreliance on final-task success, obscuring where and why agents fail. To address this gap, we