Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment

ArXi:2601.14249v4 Announce Type: replace Long chain-of-thought (CoT) trajectories provide rich supervision signals for distilling reasoning from teacher to student LLMs. However, both prior work and our experiments show that trajectories from stronger teachers do not necessarily yield better students, highlighting the importance of data-student suitability in distillation. Existing methods assess suitability primarily through student likelihood, favoring trajectories that align closely with the student model's current behavior but overlooking informative ones.