AI RESEARCH

Confidence Should Be Calibrated More Than One Turn Deep

arXiv CS.CL

ArXi:2604.05397v1 Announce Type: new Large Language Models (LLMs) are increasingly applied in high-stakes domains such as finance, healthcare, and education, where reliable multi-turn interactions with users are essential. However, existing work on confidence estimation and calibration, a major approach to building trustworthy LLM systems, largely focuses on single-turn settings and overlooks the risks and potential of multi-turn conversations. In this work, we