Expert Threshold Routing for Autoregressive Language Modeling with Dynamic Computation Allocation and Load Balancing

ArXi:2603.11535v1 Announce Type: new Token-choice Mixture-of-Experts (TC-MoE) routes each token to a fixed number of experts, limiting dynamic computation allocation and requiring auxiliary losses to maintain load balance. We propose Expert Threshold (ET) routing, where each expert maintains an exponential moving average (EMA) threshold estimated from the global token distribution. At both