Continual Calibration: Coverage Can Collapse Before Accuracy in Lifelong LLM Fine-Tuning

ArXi:2604.23987v1 Announce Type: new Continual learning for large language models is typically evaluated through accuracy retention under sequential fine-tuning. We argue that this perspective is incomplete, because uncertainty reliability can degrade earlier and sharply than top-1 performance. We study this empirically by measuring conformal coverage and calibration error on sequentially fine-tuned models across three model families and eight task sequences drawn primarily from classification and multiple-choice benchmarks.