Instruction-Free Tuning of Large Vision Language Models for Medical Instruction Following

ArXi:2603.19482v1 Announce Type: new Large vision language models (LVLMs) have nstrated impressive performance across a wide range of tasks. These capabilities largely stem from visual instruction tuning, which fine-tunes models on datasets consisting of curated image-instruction-output triplets. However, in the medical domain, constructing large-scale, high-quality instruction datasets is particularly challenging due to the need for specialized expert knowledge.