Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts

ArXi:2605.04952v1 Announce Type: new Mixture-of-experts (MoE) models enable scalable transformer architectures by activating only a subset of experts per token. Recent evidence suggests that performance improves with increasingly granular experts, i.e., many small experts instead of a few large ones. However, this regime substantially increases routing cost, which can dominate computation. We