AI RESEARCH

Measuring Black-Box Confidence via Reasoning Trajectories: Geometry, Coverage, and Verbalization

arXiv CS.AI

ArXi:2605.06308v1 Announce Type: new Reliable confidence estimation enables safe deployment of chain-of-thought (CoT) reasoning through text-only APIs. Yet the dominant black-box baseline, self-consistency over K samples, is linearly expensive and ignores the geometry of the trace. We propose a black-box trajectory-confidence score: we embed a CoT as a sliding-window trajectory and measure its convergence to external answer anchors with a one-parameter softmax. The method needs no logits, hidden states, or supervised calibrators.