AI RESEARCH

Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously

arXiv CS.CV

ArXi:2603.12262v1 Announce Type: new Online Video Large Language Models (VideoLLMs) play a critical role in ing responsive, real-time interaction. Existing methods focus on streaming perception, lacking a synchronized logical reasoning stream. However, directly applying test-time scaling methods incurs unacceptable response latency. To address this trade-off, we propose Video Streaming Thinking (VST), a novel paradigm for streaming video understanding. It s a thinking while watching mechanism, which activates reasoning over incoming video clips during streaming.