AI RESEARCH
Why Multimodal In-Context Learning Lags Behind? Unveiling the Inner Mechanisms and Bottlenecks
arXiv CS.CV
•
ArXi:2604.13403v1 Announce Type: new In-context learning (ICL) enables models to adapt to new tasks via inference-time nstrations. Despite its success in large language models, the extension of ICL to multimodal settings remains poorly understood in terms of its internal mechanisms and how it differs from text-only ICL. In this work, we conduct a systematic analysis of ICL in multimodal large language models.