Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization

ArXi:2605.13641v1 Announce Type: new Complex reinforcement learning environments frequently employ multi-task and mixed-reward formulations. In these settings, heterogeneous reward distributions and correlated reward dimensions often destabilize the construction of scalar advantages. To address these challenges, we propose Reward-Decorrelated Policy Optimization (RDPO), a reward-processing method designed to explicitly target both failure modes.