Distribution Corrected Offline Data Distillation for Large Language Models

ArXi:2605.14071v1 Announce Type: new Distilling reasoning traces from strong large language models into smaller ones is a promising route to improve intelligence in resource-constrained settings. Existing approaches face a fundamental trade-off: offline distillation from teacher-generated traces provides high-quality, sample-efficient supervision but suffers from distributional drift: during