AI RESEARCH

Stable On-Policy Distillation through Adaptive Target Reformulation

arXiv CS.LG

ArXi:2601.07155v2 Announce Type: replace Knowledge distillation (KD) is a widely adopted technique for transferring knowledge from large language models to smaller student models; however, conventional supervised KD often suffers from a distribution mismatch between