Model Predictive Control with Differentiable World Models for Offline Reinforcement Learning

ArXi:2603.22430v1 Announce Type: new Offline Reinforcement Learning (RL) aims to learn optimal policies from fixed offline datasets, without further interactions with the environment. Such methods train an offline policy (or value function), and apply it at inference time without further refinement. We