From Set Convergence to Pointwise Convergence: Finite-Time Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes

ArXi:2504.18743v2 Announce Type: replace This work presents the first finite-time analysis for the last-iterate convergence of average-reward $Q$-learning with an asynchronous implementation. A key feature of the algorithm we study is the use of adaptive stepsizes, which serve as local clocks for each state-action pair. We show that, under appropriate assumptions, the iterates generated by this $Q$-learning algorithm converge at a rate of $\tilde{\mathcal{O}}(1/k)$ (in the mean-square sense) to the optimal $Q$-function in the span seminorm.