DUALVISION: RGB-Infrared Multimodal Large Language Models for Robust Visual Reasoning

ArXi:2604.18829v1 Announce Type: new Multimodal large language models (MLLMs) have achieved impressive performance on visual perception and reasoning tasks with RGB imagery, yet they remain fragile under common degradations, such as fog, blur, or low-light conditions. Infrared (IR) imaging, a well-established complement to RGB, offers inherent robustness in these conditions, but its integration into MLLMs remains underexplored.