Video-guided Machine Translation with Global Video Context

ArXi:2604.06789v1 Announce Type: cross Video-guided Multimodal Translation (VMT) has advanced significantly in recent years. However, most existing methods rely on locally aligned video segments paired one-to-one with subtitles, limiting their ability to capture global narrative context across multiple segments in long videos.