How LLMs Detect and Correct Their Own Errors: The Role of Internal Confidence Signals

ArXi:2604.22271v1 Announce Type: new Large language models can detect their own errors and sometimes correct them without external feedback, but the underlying mechanisms remain unknown. We investigate this through the lens of second-order models of confidence from decision neuroscience. In a first-order system, confidence derives from the generation signal itself and is therefore maximal for the chosen response, precluding error detection.