Structure Causal Models and LLMs Integration in Medical Visual Question Answering

ArXi:2505.02703v2 Announce Type: replace Medical Visual Question Answering (MedVQA) aims to answer medical questions according to medical images. However, the complexity of medical data leads to confounders that are difficult to observe, so bias between images and questions is inevitable. Such cross-modal bias makes it challenging to infer medically meaningful answers.