GazeQwen: Lightweight Gaze-Conditioned LLM Modulation for Streaming Video Understanding

ArXi:2603.25841v1 Announce Type: cross Current multimodal large language models (MLLMs) cannot effectively utilize eye-gaze information for video understanding, even when gaze cues are supplied via visual overlays or text descriptions. We