AI RESEARCH

RL Fine-Tuning Heals OOD Forgetting in SFT

arXiv CS.AI • May 12, 2026

ArXi:2509.12235v3 Announce Type: replace-cross Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) is a standard post-

Read Full Article