AI RESEARCH
Revisiting the Capacity Gap in Chain-of-Thought Distillation from a Practical Perspective
arXiv CS.AI
•
ArXi:2604.08880v1 Announce Type: cross Chain-of-thought (CoT) distillation transfers reasoning behaviors from a strong teacher to a smaller student, but prior work reports a capacity gap: distillation may fail when the teacher-student capability mismatch is large. We revisit the capacity gap from a practical perspective by re-examining commonly used experimental settings. Notably, we find that CoT distillation often degrades performance compared to the student's pre-distillation baseline, an issue obscured when only post-distillation comparisons are reported. We. therefore.