AI RESEARCH
CRAFT: Cost-aware Expert Replica Allocation with Fine-Grained Layerwise Estimations
arXiv CS.LG
•
ArXi:2603.28768v1 Announce Type: cross Mixture-of-Experts (MoE) has recently emerged as the mainstream architecture for efficiently scaling large language models while maintaining near-constant computational cost. Expert parallelism distributes parameters by partitioning experts across devices, but this