Clinical Cognition Alignment for Gastrointestinal Diagnosis with Multimodal LLMs

ArXi:2603.20698v1 Announce Type: cross Multimodal Large Language Models (MLLMs) have nstrated remarkable potential in medical image analysis. However, their application in gastrointestinal endoscopy is currently hindered by two critical limitations: the misalignment between general model reasoning and standardized clinical cognitive pathways, and the lack of causal association between visual features and diagnostic outcomes. In this paper, we propose a novel Clinical-Cognitive-Aligned (CogAlign) framework to address these challenges.