AI RESEARCH
Three Phases of Expert Routing: How Load Balance Evolves During Mixture-of-Experts Training
arXiv CS.AI
•
ArXi:2604.04230v1 Announce Type: cross We model Mixture-of-Experts (MoE) token routing as a congestion game with a single effective parameter, the congestion coefficient gamma_eff, that quantifies the balance-quality tradeoff. Tracking gamma_eff across