AI RESEARCH

Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident, Especially When They are Wrong

arXiv CS.CL

ArXi:2501.09775v3 Announce Type: replace Multiple Choice Question (MCQ) tests are among the most used methods for evaluating large language models (LLMs). Besides checking the correctness of the selected answer, evaluations often consider the model's confidence through the probability assigned to its response. In this work, we investigate how LLM confidence is influenced by the answering approach when the model answers directly or reasons before responding.