AI RESEARCH

Scalable Training of Mixture-of-Experts Models with Megatron Core

arXiv CS.LG

ArXi:2603.07685v1 Announce Type: cross Scaling Mixture-of-Experts (MoE)