AI RESEARCH

SSAM: Singular Subspace Alignment for Merging Multimodal Large Language Models

arXiv CS.LG

ArXi:2603.21584v1 Announce Type: new Multimodal large language models (MLLMs) achieve strong performance by jointly processing inputs from multiple modalities, such as vision, audio, and language. However, building such models or extending them to new modalities often requires large paired datasets and substantial computational resources.