AI RESEARCH

Second-Order Actor-Critic Methods for Discounted MDPs via Policy Hessian Decomposition

arXiv CS.LG

We address the discounted reward setting in reinforcement learning (RL). To mitigate the value approximation challenges in policy gradient methods, actor-critic approaches have been developed and are known to converge to stationary points under suitable assumptions. In contrast, second-order optimization provides principled curvature-aware updates that are proven to accelerate convergence, but its application in RL is limited by the computation