AI RESEARCH
ANO: A Principled Approach to Robust Policy Optimization
arXiv CS.LG
•
ArXi:2605.02320v1 Announce Type: cross Proximal Policy Optimization (PPO) dominates deep RL but faces a fundamental dilemma. Its "hard clipping" mechanism discards valuable gradient information from outliers, leading to sample inefficiency. Conversely, removing clipping (as in SPO) exposes optimization to unbounded gradients, causing significant instability and hyperparameter sensitivity. To resolve this, we establish a Unified Trust Region Framework that generalizes existing objectives.