LLMs as Signal Detectors: Sensitivity, Bias, and the Temperature-Criterion Analogy

ArXi:2603.14893v1 Announce Type: cross Large language models (LLMs) are evaluated for calibration using metrics such as Expected Calibration Error that conflate two distinct components: the model's ability to discriminate correct from incorrect answers (sensitivity) and its tendency toward confident or cautious responding (bias). Signal Detection Theory (SDT) decomposes these components.