AI RESEARCH

Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

arXiv CS.AI

ArXi:2603.25562v1 Announce Type: cross On-policy distillation (OPD) is appealing for large language model (LLM) post-