EvoStreaming: Your Offline Video Model Is a Natively Streaming Assistant

ArXi:2605.10343v1 Announce Type: cross Streaming video understanding demands than watching longer videos: assistants must decide when to speak in real time, balancing responsiveness against verbosity. Yet most video-language models (VideoLLMs) are trained for offline inference, and existing streaming benchmarks externalize this timing decision to the evaluator. We address this gap with RealStreamEval, a frame-level multi-turn evaluation protocol that exposes models to sequential observations and penalizes unnecessary responses.