LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing

ArXi:2603.12645v1 Announce Type: cross Mixture-of-Experts (MoE) based Large Language Models (LLMs) have nstrated impressive performance and computational efficiency. However, their deployment is often constrained by substantial memory demands, primarily due to the need to load numerous expert modules. While existing expert compression techniques like pruning or merging attempt to mitigate this, they often suffer from irreversible knowledge loss or high