AI RESEARCH

Mixture-of-Experts under Finite-Rate Gating: Communication--Generalization Trade-offs

arXiv CS.LG

ArXi:2602.15091v2 Announce Type: replace-cross Mixture-of-Experts (MoE) architectures decompose prediction tasks into specialized expert sub-networks selected by a gating mechanism. This letter adopts a communication-theoretic view of MoE gating, modeling the gate as a stochastic channel operating under a finite information rate.