AI RESEARCH
Awakening Dormant Experts:Counterfactual Routing to Mitigate MoE Hallucinations
arXiv CS.AI
•
ArXi:2604.14246v1 Announce Type: cross Sparse Mixture-of-Experts (MoE) models have achieved remarkable scalability, yet they remain vulnerable to hallucinations, particularly when processing long-tail knowledge. We identify that this fragility stems from static Top-$k$ routing: routers tend to favor high-frequency patterns over rare factual associations. Consequently, ``specialist experts'' possessing critical long-tail knowledge are often assigned low gating scores and remain ``dormant'' -- under-prioritized for specific tokens despite their proven causal importance on other inputs.