AI RESEARCH
Phase-Aware Mixture of Experts for Agentic Reinforcement Learning
arXiv CS.AI
•
ArXi:2602.17038v3 Announce Type: replace Reinforcement learning (RL) has equipped LLM agents with a strong ability to solve complex tasks. However, existing RL methods normally use a \emph{single} policy network, causing \emph{simplicity bias} where simple tasks occupy most parameters and dominate gradient updates, leaving insufficient capacity for complex tasks.