AI RESEARCH
FluxMoE: Decoupling Expert Residency for High-Performance MoE Serving
arXiv CS.LG
•
ArXi:2604.02715v1 Announce Type: new Mixture-of-Experts (MoE) models have become a dominant paradigm for scaling large language models, but their rapidly growing parameter sizes