Reward Prediction with Factorized World States

ArXi:2603.09400v1 Announce Type: new Agents must infer action outcomes and select actions that maximize a reward signal indicating how close the goal is to being reached. Supervised learning of reward models could