Data Deletion Can Help in Adaptive RL

ArXi:2605.00298v1 Announce Type: new Deploying reinforcement learning policies in the real world requires adapting to time-varying environments. We study this problem in the contextual Marko Decision Process (cMDP) framework, where a family of environments is indexed by a low-dimensional context unknown at test time. The standard approach decomposes the problem: train a so-called "universal policy" which assumes knowledge of the true context, then pair it with a context estimator which approximates context using the observed trajectory.