Open-o3-Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

ArXi:2510.20579v2 Announce Type: replace-cross Most video reasoning models only generate textual reasoning traces without indicating when and where key evidence appears. Recent models such as OpenAI-o3 have sparked wide interest in evidence-centered reasoning for images, yet extending this ability to videos is challenging due to the need for joint temporal tracking and spatial localization across dynamic scenes. We