When Are Experts Misrouted? Counterfactual Routing Analysis in Mixture-of-Experts Language Models

ArXi:2605.07260v1 Announce Type: new Mixture-of-Experts (MoE) language models route each token to a small subset of experts, but whether the routes selected by a trained top-$k$ router are good ones is rarely evaluated directly. Holding the model fixed, we compare each standard route against sampled equal-compute alternatives for the same token and score each by the next-token probability it assigns to the realized token in a verified reasoning trajectory.