AI RESEARCH
On Gaussian approximation for entropy-regularized Q-learning with function approximation
arXiv CS.LG
•
ArXi:2605.17678v1 Announce Type: cross In this paper, we derive rates of convergence in the high-dimensional central limit theorem for Polyak--Ruppert averaged iterates generated by entropy-regularized asynchronous Q-learning with linear function approximation and a polynomial stepsize $k^{-\omega}$, $\omega \in (1/2,1