VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning

ArXi:2604.09529v1 Announce Type: cross Large Vision Language Models (LVLMs) achieve strong multimodal reasoning but frequently exhibit hallucinations and incorrect responses with high certainty, which hinders their usage in high-stakes domains. Existing verbalized confidence calibration methods, largely developed for text-only LLMs, typically optimize a single holistic confidence score using binary answer-level correctness.