MovieRecapsQA: A Multimodal Open-Ended Video Question-Answering Benchmark

ArXi:2601.02536v2 Announce Type: replace Understanding real-world videos such as movies requires integrating visual and dialogue cues. Yet existing VideoQA benchmarks struggle to capture this multimodal reasoning and, given the difficulty of evaluating free-form answers, largely resort to simple multiple choice questions. We