Switch-KD: Visual-Switch Knowledge Distillation for Vision-Language Models

ArXi:2604.14629v1 Announce Type: new Vision-Language Models (VLMs) have shown remarkable capabilities in joint vision-language understanding, but their large scale poses significant challenges for deployment in resource-constrained scenarios. Knowledge Distillation (KD) offers a viable way to improve model capabilities without increasing model size or data requirements, making deployment efficient.