MoE-GRPO: Optimizing Mixture-of-Experts via Reinforcement Learning in Vision-Language Models

ArXi:2603.24984v1 Announce Type: new Mixture-of-Experts (MoE) has emerged as an effective approach to reduce the computational overhead of Transformer architectures by sparsely activating a subset of parameters for each token while preserving high model capacity. This paradigm has recently been extended to Vision-Language Models (VLMs), enabling scalable multi-modal understanding with reduced computational cost. However, the widely adopted deterministic top-K routing mechanism may overlook optimal expert combinations and lead to expert overfitting.