AI RESEARCH
Scalable Training of Mixture-of-Experts Models with Megatron Core
arXiv CS.LG
•
ArXi:2603.07685v1 Announce Type: cross Scaling Mixture-of-Experts (MoE)