AI RESEARCH

Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration

arXiv CS.AI

ArXi:2601.07224v2 Announce Type: replace While Hybrid Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has become the standard paradigm for