BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE

ArXi:2605.14438v1 Announce Type: new Mixture-of-Experts (MoE) architectures enhance the efficiency of large language models by activating only a subset of experts per token. However, standard MoE employs a fixed Top-K routing strategy, leading to redundant computation and suboptimal inference latency. Existing acceleration methods either require costly re