AI RESEARCH

Toward Structural Multimodal Representations: Specialization, Selection, and Sparsification via Mixture-of-Experts

arXiv CS.LG

ArXi:2605.03348v1 Announce Type: new We propose S3 (Specialization, Selection, Sparsification), a framework that rethinks multimodal learning through a structural perspective. Instead of encoding all signals into a fixed embedding, S3 decomposes multimodal inputs into semantic experts and selectively routes them for each task. Specialization forms concept-level experts in a shared latent space, Selection adapts routing for task-specific needs, and Sparsification prunes low-utility paths to yield compact, information-minimal representations.