When Sensors Fail: Temporal Sequence Models for Robust PPO under Sensor Drift

ArXi:2603.04648v2 Announce Type: replace-cross Real-world reinforcement learning systems must operate under distributional drift in their observation streams, yet most policy architectures implicitly assume fully observed and noise-free states. We study robustness of Proximal Policy Optimization (PPO) under temporally persistent sensor failures that induce partial observability and representation shift.