Video Understanding Reward Modeling: A Robust Benchmark and Performant Reward Models

ArXi:2605.07872v1 Announce Type: cross Multimodal reward models have advanced substantially in text and image domains, yet progress in video understanding reward modeling remains severely limited by the lack of robust evaluation benchmarks and high-quality preference data. To address this, we propose a unified framework spanning benchmark design, data construction, and reward model