Fine-tuning MLLMs Without Forgetting Is Easier Than You Think

ArXi:2603.14493v1 Announce Type: cross The paper nstrate that simple adjustments of the fine-tuning recipes of multimodal large language models (MLLM) are sufficient to mitigate catastrophic forgetting. On visual question answering, we design a 2x2 experimental framework to assess model performance across in-distribution and out-of-distribution image and text inputs. Our results show that appropriate regularization, such as cons