AI RESEARCH
Stable On-Policy Distillation through Adaptive Target Reformulation
arXiv CS.LG
•
ArXi:2601.07155v2 Announce Type: replace Knowledge distillation (KD) is a widely adopted technique for transferring knowledge from large language models to smaller student models; however, conventional supervised KD often suffers from a distribution mismatch between