Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding

ArXi:2603.18472v1 Announce Type: new While Multimodal Large Language Models (MLLMs) have achieved remarkable success in interpreting natural scenes, their ability to process discrete symbols -- the fundamental building blocks of human cognition -- remains a critical open question. Unlike continuous visual data, symbols such as mathematical formulas, chemical structures, and linguistic characters require precise, deeper interpretation. This paper