Near-Policy: Accelerating On-Policy Distillation via Asynchronous Generation and Selective Packing

ArXi:2605.05940v1 Announce Type: new Standard knowledge distillation for autoregressive models often suffers from distribution mismatch. While on-policy methods mitigate this by leveraging student-generated outputs, they rely on computationally expensive Reinforcement Learning (RL) frameworks. To improve efficiency, we propose Near-Policy Distillation (NPD), an asynchronous approach that decouples student generation from