AI RESEARCH

Three Phases of Expert Routing: How Load Balance Evolves During Mixture-of-Experts Training

arXiv CS.AI

ArXi:2604.04230v1 Announce Type: cross We model Mixture-of-Experts (MoE) token routing as a congestion game with a single effective parameter, the congestion coefficient gamma_eff, that quantifies the balance-quality tradeoff. Tracking gamma_eff across