Learning to See through Illumination Extremes with Event Streaming in Multimodal Large Language Models

ArXi:2603.27558v1 Announce Type: new Multimodal Large Language Models (MLLMs) perform strong vision-language reasoning under standard conditions but fail in extreme illumination, where RGB inputs lose irrevocable structure and semantics. We propose Event-MLLM, an event-enhanced model that performs all-light visual reasoning by dynamically fusing event streams with RGB frames.