Analysis of On-policy Policy Gradient Methods under the Distribution Mismatch

ArXi:2503.22244v2 Announce Type: replace Policy gradient methods are one of the most successful approaches for solving challenging reinforcement learning problems. Despite their empirical successes, many state-of-the-art policy gradient algorithms for discounted problems deviate from the theoretical policy gradient theorem due to the existence of a distribution mismatch. In this work, we analyze the impact of this mismatch on policy gradient methods.