TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning

ArXi:2505.11737v4 Announce Type: replace-cross While Large Language Models (LLMs) have nstrated impressive capabilities, their output quality remains inconsistent across various application scenarios, making it difficult to identify trustworthy responses, especially in complex tasks requiring multi-step reasoning. In this paper, we propose a Token-level Uncertainty estimation framework for Reasoning (TokUR) that enables LLMs to self-assess and self-improve their responses in mathematical reasoning. Specifically, we.