SODA: Semi On-Policy Black-Box Distillation for Large Language Models

ArXi:2604.03873v1 Announce Type: new Black-box knowledge distillation for large language models presents a strict trade-off. Simple off-policy methods (e.g., sequence-level knowledge distillation) struggle to correct the student's inherent errors. Fully on-policy methods (e.g., Generative Adversarial Distillation) solve this via adversarial