Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models

ArXi:2603.16065v1 Announce Type: cross Reinforcement Learning (RL) has shown great potential in refining robotic manipulation policies, yet its efficacy remains strongly bottlenecked by the difficulty of designing generalizable reward functions. In this paper, we propose a framework for online policy refinement by adapting foundation VLMs into online reward generators.