AI RESEARCH
Cross-Modal Coreference Alignment: Enabling Reliable Information Transfer in Omni-LLMs
arXiv CS.CL
•
ArXi:2604.05522v1 Announce Type: new Omni Large Language Models (Omni-LLMs) have nstrated impressive capabilities in holistic multi-modal perception, yet they consistently falter in complex scenarios requiring synergistic omni-modal reasoning. Beyond understanding global multimodal context, effective reasoning also hinges on fine-grained cross-modal alignment, especially identifying shared referents across modalities, yet this aspect has been largely overlooked.