Evaluating LLM-Based 0-to-1 Software Generation in End-to-End CLI Tool Scenarios

ArXi:2604.06742v1 Announce Type: cross Large Language Models (LLMs) are driving a shift towards intent-driven development, where agents build complete software from scratch. However, existing benchmarks fail to assess this 0-to-1 generation capability due to two limitations: reliance on predefined scaffolds that ignore repository structure planning, and rigid white-box unit testing that lacks end-to-end behavioral validation. To bridge this gap, we