AI RESEARCH
KL for a KL: On-Policy Distillation with Control Variate Baseline
arXiv CS.AI
•
ArXi:2605.07865v1 Announce Type: cross On-Policy Distillation (OPD) has emerged as a dominant post-