OR-VSKC: Resolving Visual-Semantic Knowledge Conflicts in Operating Rooms with Synthetic Data-Guided Alignment

ArXi:2506.22500v2 Announce Type: replace-cross Automated identification of surgical safety risks is critical for improving patient outcomes; however, Multimodal Large Language Models (MLLMs) frequently suffer from Visual-Semantic Knowledge Conflicts (VS-KC), a phenomenon where models possess safety knowledge but fail to activate it during visual inspection. Investigating this alignment gap in operating rooms (ORs) is impeded by a critical bottleneck: the scarcity and privacy constraints of real-world OR data depicting safety violations. To address this, we.