A Forensic Analysis of Synthetic Data in RL: Diagnosing and Solving Algorithmic Failures in Model-Based Policy Optimization

ArXi:2510.01457v4 Announce Type: replace Synthetic data is central to data-efficient Dyna-style model-based reinforcement learning, but it can also degrade performance. We study this failure in Model-Based Policy Optimization (MBPO), which performs actor-critic updates using model-generated synthetic state transitions. Although MBPO reports strong sample-efficiency gains on OpenAI Gym, recent work shows that it often underperforms Soft Actor-Critic (SAC), its non-Dyna base, in the DeepMind Control Suite (DMC), despite both suites involving MuJoCo-based proprioceptive continuous control.