ReasonMap: Towards Fine-Grained Visual Reasoning from Transit Maps

ArXi:2505.18675v3 Announce Type: replace-cross Multimodal large language models (MLLMs) have nstrated significant progress in semantic scene understanding and text-image alignment, with reasoning variants enhancing performance on complex tasks involving mathematics and logic. To bridge this gap, we