MTA: Multi-Granular Trajectory Alignment for Large Language Model Distillation

ArXi:2605.01374v1 Announce Type: new Knowledge distillation is a key technique for compressing large language models (LLMs), but most existing methods align representations at fixed layers or token-level outputs, ignoring how representations evolve across depth. As a result, the student is only weakly guided to capture the teacher's internal relational structure during distillation, which limits knowledge transfer.