AI RESEARCH
UCCI: Calibrated Uncertainty for Cost-Optimal LLM Cascade Routing
arXiv CS.LG
•
ArXi:2605.18796v1 Announce Type: new LLM cascades and model routing promise lower inference cost by sending easy queries to a small model and escalating hard ones to a large model, but most deployed routers use uncalibrated confidence scores and require per-workload threshold tuning. We present UCCI, a calibration-first router that maps token-level margin uncertainty to a per-query error probability via isotonic regression and selects the escalation threshold by constrained cost minimization.