AI RESEARCH

MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems

arXiv CS.CV

ArXi:2503.16549v2 Announce Type: replace Despite strong results on many tasks, multimodal large language models (MLLMs) still underperform on visual mathematical problem solving, especially in reliably perceiving and interpreting diagrams. Inspired by human problem-solving, we hypothesize that the ability to extract meaningful information from diagrams is pivotal, as it directly conditions subsequent inference. Hence, we