Online In-Context Distillation for Low-Resource Vision Language Models

ArXi:2510.18117v2 Announce Type: replace As the field continues its push for ever resources, this work turns the spotlight on a critical question: how can vision-language models (VLMs) be adapted to thrive in low-resource, budget-constrained settings? While large VLMs offer strong performance, they are impractical to deploy in such settings. Small VLMs, on the other hand, are efficient but typically require costly fine-tuning to close the performance gap with larger models in the deployment domain.