AI RESEARCH

MLLM-HWSI: A Multimodal Large Language Model for Hierarchical Whole Slide Image Understanding

arXiv CS.CV

ArXi:2603.23067v1 Announce Type: new Whole Slide Images (WSIs) exhibit hierarchical structure, where diagnostic information emerges from cellular morphology, regional tissue organization, and global context. Existing Computational Pathology (CPath) Multimodal Large Language Models (MLLMs) typically compress an entire WSI into a single embedding, which hinders fine-grained grounding and ignores how pathologists synthesize evidence across different scales. We