Q-Learning with Shift-Aware Upper Confidence Bound in Non-Stationary Reinforcement Learning

ArXi:2510.03181v2 Announce Type: replace We study the Non-Stationary Reinforcement Learning (RL) under distribution shifts in both finite-horizon episodic and infinite-horizon discounted Marko Decision Processes (MDPs). In the finite-horizon case, the transition functions may suddenly change at a particular episode. In the infinite-horizon setting, such changes can occur at an arbitrary time step during the agent's interaction with the environment.