VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?

ArXi:2505.23359v2 Announce Type: replace Recent studies have shown that long chain-of-thought (CoT) reasoning can significantly enhance the performance of large language models (LLMs) on complex tasks. However, this benefit is yet to be nstrated in the domain of video understanding, since most existing benchmarks lack the reasoning depth required to nstrate the advantages of extended CoT chains. While recent efforts have proposed benchmarks aimed at video reasoning, the tasks are often knowledge-driven and do not rely heavily on visual content. To bridge this gap, we