AI RESEARCH

On-Policy Distillation with Best-of-N Teacher Rollout Selection

arXiv CS.CV

ArXi:2605.09725v1 Announce Type: new On-policy distillation (OPD), which supervises a student on its own sampled trajectories, has emerged as a data-efficient post-