MSRL: Scaling Generative Multimodal Reward Modeling via Multi-Stage Reinforcement Learning

ArXi:2603.25108v1 Announce Type: new Recent advances in multimodal reward modeling have been largely driven by a paradigm shift from discriminative to generative approaches. Building on this progress, recent studies have further employed reinforcement learning from verifiable rewards (RLVR) to enhance multimodal reward models (MRMs). Despite their success, RLVR-based