Natural Policy Gradient as Doubly Smoothed Policy Iteration: A Bellman-Operator Framework

ArXi:2605.10671v1 Announce Type: new In this work, we show that natural policy gradient, a core algorithm in reinforcement learning, admits an exact formulation as a smoothed and averaged form of policy iteration. Specifically, we