AI RESEARCH
$\phi$-Balancing for Mixture-of-Experts Training
arXiv CS.LG
•
ArXi:2605.15403v1 Announce Type: new Mixture-of-Experts (MoE) models rely on balanced expert utilization to fully realize their scalability. However, existing load-balancing methods are largely heuristic and operate on noisy mini-batch assignment statistics,