AI RESEARCH

Near-Policy: Accelerating On-Policy Distillation via Asynchronous Generation and Selective Packing

arXiv CS.LG

ArXi:2605.05940v1 Announce Type: new Standard knowledge distillation for autoregressive models often suffers from distribution mismatch. While on-policy methods mitigate this by leveraging student-generated outputs, they rely on computationally expensive Reinforcement Learning (RL) frameworks. To improve efficiency, we propose Near-Policy Distillation (NPD), an asynchronous approach that decouples student generation from