AI RESEARCH
Robust Policy Optimization to Prevent Catastrophic Forgetting
arXiv CS.LG
•
ArXi:2602.08813v2 Announce Type: replace Large language models are commonly trained through multi-stage post-