Uncovering Intra-expert Activation Sparsity for Efficient Mixture-of-Expert Model Execution

ArXi:2605.08575v1 Announce Type: cross Mixture of Experts (MoE) architecture has become the standard for state-of-the-art large language models, owing to its computational efficiency through sparse expert activation. However, sparsity through finer expert granularity is becoming increasingly difficult to achieve due to fundamental