AI RESEARCH

On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training

arXiv CS.AI

ArXi:2601.07389v2 Announce Type: replace-cross Post-