SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning

ArXi:2512.13874v2 Announce Type: replace As humans, we are natural any-horizon reasoners, i.e., we can decide whether to iteratively skim long videos or watch short ones in full when necessary for a given task. With this in mind, one would expect video reasoning models to reason flexibly across different durations. However, SOTA models are still trained to predict answers in a single turn while processing a large number of frames, akin to watching an entire long video, requiring significant resources.