A Finite-Iteration Theory for Asynchronous Categorical Distributional Temporal-Difference Learning

ArXi:2605.06866v1 Announce Type: new Recent non-asymptotic analyses have substantially advanced the theory of distributional policy evaluation, but they largely concern synchronous full-state updates under a generative model, model-based estimators, accelerated variants, or different approximation architectures. Standard categorical temporal-difference learning is typically used in a different regime. It asynchronously performs a single-state update at each iteration and, in online settings, is driven by a Markovian trajectory.