BicKD: Bilateral Contrastive Knowledge Distillation

ArXi:2602.01265v2 Announce Type: replace Knowledge distillation (KD) is a machine learning framework that transfers knowledge from a teacher model to a student model. The vanilla KD proposed by Hinton has been the dominant approach in logit-based distillation and nstrates compelling performance. However, it only performs sample-wise probability alignment between teacher and student's predictions, lacking an mechanism for class-wise comparison. Besides, vanilla KD imposes no structural constraint on the probability space.