Improving Semantic Uncertainty Quantification in Language Model Question-Answering via Token-Level Temperature Scaling

ArXi:2604.07172v1 Announce Type: new Calibration is central to reliable semantic uncertainty quantification, yet prior work has largely focused on discrimination, neglecting calibration. As calibration and discrimination capture distinct aspects of uncertainty, focusing on discrimination alone yields an incomplete picture. We address this gap by systematically evaluating both aspects across a broad set of confidence measures.