MINERVA-Cultural: A Benchmark for Cultural and Multilingual Long Video Reasoning

ArXi:2601.10649v2 Announce Type: replace Recent advancements in video models have shown tremendous progress, particularly in long video understanding. However, current benchmarks predominantly feature western-centric data and English as the dominant language,