AI RESEARCH

Swimba: Switch Mamba Model Scales State Space Models

arXiv CS.LG

ArXi:2603.06938v1 Announce Type: new Mixture-of-experts (MoE) is a common approach for increasing parameter capacity, but applying MoE to state space model (SSM) token mixers can multiply the cost of the recurrent state update. We study how to