WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models

ArXi:2604.18224v1 Announce Type: cross Large language models are rapidly evolving into interactive coding agents capable of end-to-end web coding, yet existing benchmarks evaluate only narrow slices of this capability, typically text-conditioned generation with static-correctness metrics, leaving visual fidelity, interaction quality, and codebase-level reasoning largely unmeasured. We