AI RESEARCH
The Extrapolation Cliff in On-Policy Distillation of Near-Deterministic Structured Outputs
arXiv CS.LG
•
ArXi:2605.08737v1 Announce Type: new On-policy distillation (OPD) is widely used for LLM post-