MVPBench: A Multi-Video Perception Evaluation Benchmark for Multi-Modal Video Understanding

ArXi:2603.22756v1 Announce Type: new The rapid progress of Large Language Models (LLMs) has spurred growing interest in Multi-modal LLMs (MLLMs) and motivated the development of benchmarks to evaluate their perceptual and comprehension abilities. Existing benchmarks, however, are limited to static images or single videos, overlooking the complex interactions across multiple videos. To address this gap, we