AI RESEARCH

Grid2Matrix: Revealing Digital Agnosia in Vision-Language Models

arXiv CS.AI

ArXi:2604.09687v1 Announce Type: cross Vision-Language Models (VLMs) excel on many multimodal reasoning benchmarks, but these evaluations often do not require an exhaustive readout of the image and can. therefore. obscure failures in faithfully capturing all visual details. We