Can LLMs Detect Their Confabulations? Estimating Reliability in Uncertainty-Aware Language Models

ArXi:2508.08139v3 Announce Type: replace-cross Large Language Models (LLMs) are prone to generating fluent but incorrect content, known as confabulation, which poses increasing risks in multi-turn or agentic applications where outputs may be reused as context. In this work, we investigate how in-context information influences model behavior and whether LLMs can identify their unreliable responses. We propose a reliability estimation that leverages token-level uncertainty to guide the aggregation of internal model representations.