AI RESEARCH
RL Fine-Tuning Heals OOD Forgetting in SFT
arXiv CS.AI
•
ArXi:2509.12235v3 Announce Type: replace-cross Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) is a standard post-