Gaze-VLM:Bridging Gaze and VLMs through Attention Regularization for Egocentric Understanding

ArXi:2510.21356v2 Announce Type: replace-cross Eye gaze offers valuable cues about attention, short-term intent, and future actions, making it a powerful signal for modeling egocentric behavior. In this work, we propose a gaze-regularized framework that enhances VLMs for two key egocentric understanding tasks: fine-grained future event prediction and current activity understanding. Unlike prior approaches that rely solely on visual inputs or use gaze as an auxiliary input signal, our method uses gaze only during.