Augmented Lagrangian Method for Last-Iterate Convergence for Constrained MDPs

ArXi:2605.11694v1 Announce Type: new We study policy optimization for infinite-horizon, discounted constrained Marko decision processes (CMDPs). While existing theoretical guarantees typically hold for the mixture policy, deploying such a policy is computationally and memory intensive. This leads to a practical mismatch where a single (last-iterate) policy must be deployed. Recent theoretical works have thus focused on proving last-iterate convergence, but are largely limited to the tabular setting or to algorithmic variants that are rarely used in practice.