AI RESEARCH

$\phi$-Balancing for Mixture-of-Experts Training

arXiv CS.LG

ArXi:2605.15403v1 Announce Type: new Mixture-of-Experts (MoE) models rely on balanced expert utilization to fully realize their scalability. However, existing load-balancing methods are largely heuristic and operate on noisy mini-batch assignment statistics,