AI RESEARCH

Predictive Entropy Links Calibration and Paraphrase Sensitivity in Medical Vision-Language Models

arXiv CS.LG

ArXi:2604.08941v1 Announce Type: new Medical Vision Language Models VLMs suffer from two failure modes that threaten safe deployment mis calibrated confidence and sensitivity to question rephrasing. We show they share a common cause, proximity to the decision boundary, by benchmarking five uncertainty quantification methods on MedGemma 4BIT across in distribution MIMIC CXR and outof distribution PadChest chest X ray datasets, with cross architecture validation on LLaVA RAD7B.