Efficient Quantization of Mixture-of-Experts with Theoretical Generalization Guarantees

ArXi:2604.06515v1 Announce Type: new Sparse Mixture-of-Experts (MoE) allows scaling of language and vision models efficiently by activating only a small subset of experts per input. While this reduces computation, the large number of parameters still incurs substantial memory overhead during inference. Post-