Task-Specific Knowledge Distillation via Intermediate Probes

ArXi:2603.12270v1 Announce Type: cross Knowledge distillation from large language models (LLMs) assumes that the teacher's output distribution is a high-quality