L2R: Low-Rank and Lipschitz-Controlled Routing for Mixture-of-Experts

ArXi:2601.21349v2 Announce Type: replace Mixture-of-Experts (MoE) models scale neural networks by conditionally activating a small subset of experts, where the router plays a central role in determining expert specialization and overall model performance. However, many modern MoE systems still adopt linear routers in raw high-dimensional representation spaces, where representation mismatch, angular concentration, and scale-sensitive scoring can jointly undermine routing discriminability and stable expert specialization.