Online Statistical Inference of Constant Sample-averaged Q-Learning

ArXi:2603.26982v1 Announce Type: cross Reinforcement learning algorithms have been widely used for decision-making tasks in various domains. However, the performance of these algorithms can be impacted by high variance and instability, particularly in environments with noise or sparse rewards. In this paper, we propose a framework to perform statistical online inference for a sample-averaged Q-learning approach.