AI RESEARCH
Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models
arXiv CS.LG
•
ArXi:2604.01622v1 Announce Type: new Diffusion language models (DLMs) enable parallel, non-autoregressive text generation, yet existing DLM mixture-of-experts (MoE) models inherit token-choice (TC) routing from autoregressive systems, leading to load imbalance and rigid computation allocation. We show that expert-choice (EC) routing is a better fit for DLMs: it provides deterministic load balancing by design, yielding higher throughput and faster convergence than TC. Building on the property that EC capacity is externally controllable, we.