Primal-Dual Policy Optimization for Linear CMDPs with Adversarial Losses

ArXi:2605.11535v1 Announce Type: new Existing work on linear constrained Marko decision processes (CMDPs) has primarily focused on stochastic settings, where the losses and costs are either fixed or drawn from fixed distributions. However, such formulations are inherently vulnerable to adversarially changing environments. To overcome this limitation, we propose a primal-dual policy optimization algorithm for online finite-horizon {adversarial} linear CMDPs, where the losses are adversarially chosen under full-information feedback and the costs are stochastic under bandit feedback.