Stable On-Policy Distillation through Adaptive Target Reformulation

ArXi:2601.07155v2 Announce Type: replace Knowledge distillation (KD) is a widely adopted technique for transferring knowledge from large language models to smaller student models; however, conventional supervised KD often suffers from a distribution mismatch between