Global Convergence of Average Reward Constrained MDPs with Neural Critic and General Policy Parameterization

ArXi:2603.07698v1 Announce Type: new We study infinite-horizon Constrained Marko Decision Processes (CMDPs) with general policy parameterizations and multi-layer neural network critics. Existing theoretical analyses for constrained reinforcement learning largely rely on tabular policies or linear critics, which limits their applicability to high-dimensional and continuous control problems.