AI RESEARCH
Dual-Space Knowledge Distillation with Key-Query Matching for Large Language Models with Vocabulary Mismatch
arXiv CS.CL
•
ArXi:2603.22056v1 Announce Type: new Large language models (LLMs) achieve state-of-the-art (SOTA) performance across language tasks, but are costly to deploy due to their size and resource demands. Knowledge Distillation (KD) addresses this by