AI RESEARCH
Modality Collapse as Mismatched Decoding: Information-Theoretic Limits of Multimodal LLMs
arXiv CS.AI
•
ArXi:2602.23136v2 Announce Type: replace-cross Numerous studies have shown that multimodal LLMs process speech and images well but fail in non-intuitive ways rendering trivial tasks such as object counting unreliable.