EvoGround: Self-Evolving Video Agents for Video Temporal Grounding

ArXi:2605.13803v1 Announce Type: new Video temporal grounding (VTG) takes an untrimmed video and a natural-language query as input and localizes the temporal moment that best matches the query. Existing methods rely on large, task-specific datasets requiring costly manual annotation. We