AI RESEARCH

RL Fine-Tuning Heals OOD Forgetting in SFT

arXiv CS.AI

ArXi:2509.12235v3 Announce Type: replace-cross Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) is a standard post-