Evaluating Prompting and Execution-Based Methods for Deterministic Computation in LLMs

ArXi:2605.03227v1 Announce Type: new Large Language Models (LLMs) have nstrated strong capabilities in natural language understanding and reasoning. However, their ability to perform exact, deterministic computation remains unclear. In this work, we systematically evaluate multiple prompting strategies, including Chain-of-Thought (CoT), Least-to-Most decomposition, Program-of-Thought (PoT), and Self-Consistency (SC), on tasks requiring precise and error-free outputs, including binary counting, longest substring detection, and arithmetic evaluation. To this study, we