AI RESEARCH
Trust the uncertain teacher: distilling dark knowledge via calibrated uncertainty
arXiv CS.LG
•
ArXi:2602.12687v2 Announce Type: replace The core of knowledge distillation lies in transferring the teacher's rich 'dark knowledge'-subtle probabilistic patterns that reveal how classes are related and the distribution of uncertainties. While this idea is well established, teachers trained with conventional cross-entropy often fail to preserve such signals. Their distributions collapse into sharp, overconfident peaks that appear decisive but are in fact brittle, offering little beyond the hard label or subtly hindering representation-level transfer.