Multi-Aspect Knowledge Distillation for Language Model with Low-rank Factorization

ArXi:2604.03110v1 Announce Type: new Knowledge distillation is an effective technique for pre-trained language model compression. However, existing methods only focus on the knowledge distribution among layers, which may cause the loss of fine-grained information in the alignment process. To address this issue, we