AI RESEARCH
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
arXiv CS.AI
•
ArXi:2508.11408v3 Announce Type: replace-cross Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) are two prominent post-