AI RESEARCH

SurgCheck: Do Vision-Language Models Really Look at Images in Surgical VQA?

arXiv CS.CV

ArXi:2605.01911v1 Announce Type: new Purpose: Vision-language models (VLMs) have shown promising performance in surgical visual question answering (VQA). However, existing surgical VQA datasets often contain linguistic shortcuts, where question phrasing implicitly constrains the answer space. It remains unclear whether reported performance reflects visual understanding or reliance on such linguistic shortcuts. Methods: We