Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting

ArXi:2508.04227v2 Announce Type: replace-cross Vision-language models (VLMs) and the recent surge of Multimodal Large Language Models (MLLMs) have revolutionized artificial intelligence with unprecedented cross-modal alignment and zero-shot generalization. However, enabling them to learn continually from non-stationary data remains a major challenge, as their cross-modal alignment and generalization capabilities are particularly vulnerable to catastrophic forgetting.