AI RESEARCH

Narrative Aligned Long Form Video Question Answering

arXiv CS.CV

ArXi:2603.19481v1 Announce Type: new Recent progress in multimodal large language models (MLLMs) has led to a surge of benchmarks for long-video reasoning. However, most existing benchmarks rely on localized cues and fail to capture narrative reasoning, the ability to track intentions, connect distant events, and reconstruct causal chains across an entire movie. We