Static and Dynamic Graph Alignment Network for Temporal Video Grounding

ArXi:2605.00684v1 Announce Type: new Temporal Video Grounding (TVG) aims to localize temporal moments in an untrimmed video that semantically correspond to given natural language queries. Recently, Graph Convolutional Networks (GCN) have been widely adopted in TVG to model temporal relations among video clips and enhance contextual reasoning by constructing clip-level graphs.