Model-Based Reinforcement Learning under Random Observation Delays

ArXi:2509.20869v2 Announce Type: replace-cross Delays frequently occur in real-world environments, yet standard reinforcement learning (RL) algorithms often assume instantaneous perception of the environment. We study random sensor delays in POMDPs, where observations may arrive out-of-sequence, a setting that has not been previously addressed in RL. We analyze the structure of such delays and nstrate that naive approaches, such as stacking past observations, are insufficient for reliable performance.