Code-Guided Reasoning for Small Language Models: Evaluating Executable MCQA Scaffolds

ArXi:2605.18827v1 Announce Type: cross Multiple-choice QA benchmarks usually evaluate small language models (SLMs) as direct answerers, but deployed language-model systems increasingly rely on external scaffolds such as tools, code, and repeated model calls. We