AI RESEARCH
Swimba: Switch Mamba Model Scales State Space Models
arXiv CS.LG
•
ArXi:2603.06938v1 Announce Type: new Mixture-of-experts (MoE) is a common approach for increasing parameter capacity, but applying MoE to state space model (SSM) token mixers can multiply the cost of the recurrent state update. We study how to