AI RESEARCH
Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration
arXiv CS.AI
•
ArXi:2601.07224v2 Announce Type: replace While Hybrid Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has become the standard paradigm for