A General Control-Theoretic Approach for Reinforcement Learning: Theory and Algorithms

ArXi:2406.14753v4 Announce Type: replace We devise a control-theoretic reinforcement learning approach to direct learning of the optimal policy. We establish various theoretical properties of our approach, such as convergence and optimality of our analog of the Bellman operator and Q-learning, a new control-policy-variable gradient theorem, and a specific gradient ascent algorithm based on this theorem within the context of a specific control-theoretic framework.