Multi-objective Reinforcement Learning With Augmented States Requires Rewards After Deployment

ArXi:2604.15757v1 Announce Type: new This research note identifies a previously overlooked distinction between multi-objective reinforcement learning (MORL), and conventional single-objective reinforcement learning (RL). It has previously been noted that the optimal policy for an MORL agent with a non-linear utility function is required to be conditioned on both the current environmental state and on some measure of the previously accrued reward.