Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification

ArXi:2603.26648v1 Announce Type: cross Recent advances in large language models have improved the capabilities of coding agents, yet systematic evaluation of complex, end-to-end website development remains limited. To address this gap, we