AI RESEARCH

Enhancing Mixture-of-Experts Specialization via Cluster-Aware Upcycling

arXiv CS.CV

ArXi:2604.13508v1 Announce Type: new Sparse Upcycling provides an efficient way to initialize a Mixture-of-Experts (MoE) model from pretrained dense weights instead of