AI RESEARCH

The Extrapolation Cliff in On-Policy Distillation of Near-Deterministic Structured Outputs

arXiv CS.LG

ArXi:2605.08737v1 Announce Type: new On-policy distillation (OPD) is widely used for LLM post-