EGAD: Entropy-Guided Adaptive Distillation for Token-Level Knowledge Transfer

ArXi:2605.01732v1 Announce Type: new Large language models (LLMs) have achieved remarkable performance across diverse domains, yet their enormous computational and memory requirements hinder deployment in resource-constrained environments. Knowledge distillation offers a promising solution by transferring knowledge from a large teacher model to a smaller student model. However, existing distillation methods typically treat all tokens equally, ignoring the fact that different tokens contribute unequally to model decisions.