AI RESEARCH

Towards Unified Surgical Scene Understanding:Bridging Reasoning and Grounding via MLLMs

arXiv CS.AI

ArXi:2605.13530v1 Announce Type: cross Surgical scene understanding is a cornerstone of computer-assisted intervention. While recent advances, particularly in surgical image segmentation, have driven progress, real-world clinical applications require a holistic understanding that jointly captures procedural context, semantic reasoning, and precise visual grounding. However, existing approaches typically address these components in isolation, leading to fragmented representations and limited semantic consistency.