AI RESEARCH
Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts
arXiv CS.LG
•
ArXi:2604.19835v1 Announce Type: new Mixture-of-Experts (MoE) has become the dominant architecture for scaling large language models: frontier models routinely decouple total parameters from per-token computation through sparse expert routing. Scaling laws show that under fixed active computation, model quality scales predictably with total parameters, and MoEs realize this by increasing expert count. However