MobileDev-Bench: A Comprehensive Benchmark for Evaluating Language Models on Mobile Application Development

ArXi:2603.24946v1 Announce Type: cross Large language models (LLMs) have shown strong performance on automated software engineering tasks, yet existing benchmarks focus primarily on general-purpose libraries or web applications, leaving mobile application development largely unexplored despite its strict platform constraints, framework-driven lifecycles, and complex platform API interactions. We