Cross-fitted Proximal Learning for Model-Based Reinforcement Learning

ArXi:2604.05185v1 Announce Type: new Model-based reinforcement learning is attractive for sequential decision-making because it explicitly estimates reward and transition models and then s planning through simulated rollouts. In offline settings with hidden confounding, however, models learned directly from observational data may be biased. This challenge is especially pronounced in partially observable systems, where latent factors may jointly affect actions, rewards, and future observations.