AI RESEARCH

Route Experts by Sequence, not by Token

arXiv CS.AI

ArXi:2511.06494v2 Announce Type: replace-cross Mixture-of-Experts (MoE) architectures scale large language models (LLMs) by activating only a subset of experts per token, but the standard TopK routing assigns the same fixed number of experts to all tokens, ignoring their varying complexity. Prior adaptive routing methods