Generalizable Dense Reward for Long-Horizon Robotic Tasks

ArXi:2604.00055v1 Announce Type: cross Existing robotic foundation policies are trained primarily via large-scale imitation learning. While such models nstrate strong capabilities, they often struggle with long-horizon tasks due to distribution shift and error accumulation. While reinforcement learning (RL) can finetune these models, it cannot work well across diverse tasks without manual reward engineering.