Distillation Traps and Guards: A Calibration Knob for LLM Distillability

ArXi:2604.18963v1 Announce Type: cross Knowledge distillation (KD) transfers capabilities from large language models (LLMs) to smaller students, yet it can fail unpredictably and also underpins model leakage risks. Our analysis revealed several distillation traps: tail noise, off-policy instability, and, most fundamentally, the teacher-student gap, that distort