AI RESEARCH

Robust Policy Optimization to Prevent Catastrophic Forgetting

arXiv CS.LG

ArXi:2602.08813v2 Announce Type: replace Large language models are commonly trained through multi-stage post-