Response-G1: Explicit Scene Graph Modeling for Proactive Streaming Video Understanding

ArXi:2605.07575v1 Announce Type: cross Proactive streaming video understanding requires Video-LLMs to decide when to respond as a video unfolds, a task where existing methods often fall short due to their implicit, query-agnostic modeling of visual evidence. We