AI RESEARCH
LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing
arXiv CS.AI
•
ArXi:2603.12645v1 Announce Type: cross Mixture-of-Experts (MoE) based Large Language Models (LLMs) have nstrated impressive performance and computational efficiency. However, their deployment is often constrained by substantial memory demands, primarily due to the need to load numerous expert modules. While existing expert compression techniques like pruning or merging attempt to mitigate this, they often suffer from irreversible knowledge loss or high