AI RESEARCH
The Cost of Language: Centroid Erasure Exposes and Exploits Modal Competition in Multimodal Language Models
arXiv CS.AI
•
ArXi:2604.14363v1 Announce Type: cross Multimodal language models systematically underperform on visual perception tasks, yet the structure underlying this failure remains poorly understood. We propose centroid replacement, collapsing each token to its nearest K-means centroid, as a controlled probe for modal dependence.