Teacher-Guided Routing for Sparse Vision Mixture-of-Experts

ArXi:2604.21330v1 Announce Type: new Recent progress in deep learning has been driven by increasingly large-scale models, but the resulting computational cost has become a critical bottleneck. Sparse Mixture of Experts (MoE) offers an effective solution by activating only a small subset of experts for each input, achieving high scalability without sacrificing inference speed. Although effective, sparse MoE