AI RESEARCH

Robust Policy Optimization to Prevent Catastrophic Forgetting

arXiv CS.LG • May 13, 2026

ArXi:2602.08813v2 Announce Type: replace Large language models are commonly trained through multi-stage post-