An Empirical Evaluation of Locally Deployed LLMs for Bug Detection in Python Code

ArXi:2604.23361v1 Announce Type: cross Large language models (LLMs) have nstrated strong performance on a wide range of software engineering tasks, including code generation and analysis. However, most prior work relies on cloud-based models or specialized hardware, limiting practical applicability in privacy-sensitive or resource-constrained environments. In this paper, we present a systematic empirical evaluation of two locally deployed LLMs, LLaMA 3.2 and Mistral, for real-world Python bug detection using the BugsInPy benchmark.