On the Divergence of Differential Temporal Difference Learning without Local Clocks

ArXi:2605.06874v1 Announce Type: new Learning rate is a critical component of reinforcement learning (RL). This work uses global and local clocks to distinguish two types of learning rates. The former is of the standard form $\alpha_t$ that depends only on the time step $t$ (i.e., a global clock). The latter is of the form $\alpha_{\nu(S_t, t)}$, where $\nu(s, t)$ counts the number of visits to state $s$ until time $t$ (i.e., a local clock). In discounted RL, an RL algorithm that is convergent with a local clock is always also convergent with a global clock, and vice versa.