AI RESEARCH

MApLe: Multi-instance Alignment of Diagnostic Reports and Large Medical Images

arXiv CS.CV

ArXi:2604.13970v1 Announce Type: new In diagnostic reports, experts encode complex imaging data into clinically actionable information. They describe subtle pathological findings that are meaningful in their anatomical context. Reports follow relatively consistent structures, expressing diagnostic information with few words that are often associated with tiny but consequential image observations. Standard vision language models struggle to identify the associations between these informative text components and small locations in the images.