When to Trust the Answer: Question-Aligned Semantic Nearest Neighbor Entropy for Safer Surgical VQA

ArXi:2511.01458v2 Announce Type: replace Safety and reliability are critical for deploying visual question answering (VQA) systems in surgery, where incorrect or ambiguous responses can cause patient harm. A key limitation of existing uncertainty estimation methods, such as Semantic Nearest Neighbor Entropy (SNNE), is that they do not explicitly account for the conditioning question. As a result, they may assign high confidence to answers that are semantically consistent yet misaligned with the clinical question, especially under variation in question phrasing.