AI RESEARCH
On-Policy Distillation with Best-of-N Teacher Rollout Selection
arXiv CS.CV
•
ArXi:2605.09725v1 Announce Type: new On-policy distillation (OPD), which supervises a student on its own sampled trajectories, has emerged as a data-efficient post-