E2Edev: Benchmarking Large Language Models in End-to-End Software Development Task

ArXi:2510.14509v3 Announce Type: replace-cross The rapid advancement in large language models (LLMs) has nstrated significant potential in End-to-End Software Development (E2ESD). However, existing E2ESD benchmarks are limited by coarse-grained requirement specifications and unreliable evaluation protocols, hindering a true understanding of current framework capabilities.