Reward-Conditioned Reinforcement Learning

ArXi:2603.05066v2 Announce Type: replace Single-task RL agents are typically trained under a fixed reward function, which limits their robustness to reward misspecification and their ability to adapt to changing preferences. We