AI RESEARCH

ReTool-Video: Recursive Tool-Using Video Agents with Meta-Augmented Tool Grounding

arXiv CS.AI

ArXi:2605.13228v1 Announce Type: cross Video understanding requires active evidence seeking, motivating tool-augmented video agents for temporal reasoning, cross-modal understanding, and complex question answering. Existing video agents have improved video reasoning with retrieval, memory, frame inspection, and verifier tools, but they still face two limitations: (1) a coarse tool space that lacks fine-grained operations for compositional reasoning; and (2) a flat action space that forces high-level video intents into primitive executable tool calls.