Online Reasoning Calibration: Test-Time Training Enables Generalizable Conformal LLM Reasoning

ArXi:2604.01170v1 Announce Type: cross While test-time scaling has enabled large language models to solve highly difficult tasks, state-of-the-art results come at exorbitant compute costs. These inefficiencies can be attributed to the miscalibration of post-trained language models, and the lack of calibration in popular sampling techniques. Here, we present Online Reasoning Calibration (ORCA), a framework for calibrating the sampling process that draws upon conformal prediction and test-time.