EgoPro-Bench: Benchmarking Personalized Proactive Interaction in Egocentric Video Streams

ArXi:2605.07299v1 Announce Type: cross Existing Multimodal Large Language Models (MLLMs) remain primarily reactive, failing to continuously perceive environments or proactively assist users. While emerging benchmarks address proactivity, they are largely confined to alert scenarios, neglect personalized context, and fail to evaluate the precise timing of human-machine interactions (HMI). In this paper, we