Phase-Aware Mixture of Experts for Agentic Reinforcement Learning

ArXi:2602.17038v3 Announce Type: replace Reinforcement learning (RL) has equipped LLM agents with a strong ability to solve complex tasks. However, existing RL methods normally use a \emph{single} policy network, causing \emph{simplicity bias} where simple tasks occupy most parameters and dominate gradient updates, leaving insufficient capacity for complex tasks.