Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning

ArXi:2605.11458v1 Announce Type: new On-policy self-distillation has become a strong recipe for LLM reasoning, where a privileged teacher supervises the student's own rollouts while conditioning on the reference solution. A design choice shared by nearly all such methods, however, has gone unquestioned: the teacher always sees the full reference reasoning.