Unsupervised Confidence Calibration for Reasoning LLMs from a Single Generation

ArXi:2604.19444v1 Announce Type: new Reasoning language models can solve increasingly complex tasks, but struggle to produce the calibrated confidence estimates necessary for reliable deployment. Existing calibration methods usually depend on labels or repeated sampling at inference time, making them impractical in many settings. We