AI RESEARCH

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

arXiv CS.AI

ArXi:2508.11408v3 Announce Type: replace-cross Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) are two prominent post-