Causal Evidence that Language Models use Confidence to Drive Behavior

ArXi:2603.22161v1 Announce Type: new Metacognition -- the ability to assess one's own cognitive performance -- is documented across species, with internal confidence estimates serving as a key signal for adaptive behavior. While confidence can be extracted from Large Language Model (LLM) outputs, whether models actively use these signals to regulate behavior remains a fundamental question. We investigate this through a four-phase abstention paradigm. Phase 1 established internal confidence estimates in the absence of an abstention option.