Advantage-Guided Diffusion for Model-Based Reinforcement Learning

ArXi:2604.09035v1 Announce Type: new Model-based reinforcement learning (MBRL) with autoregressive world models suffers from compounding errors, whereas diffusion world models mitigate this by generating trajectory segments jointly. However, existing diffusion guides are either policy-only, discarding value information, or reward-based, which becomes myopic when the diffusion horizon is short. We