Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form

ArXi:2408.16286v5 Announce Type: replace Designing a safe policy for uncertain environments is crucial in real-world control systems. However, this challenge remains inadequately addressed within the Marko decision process (MDP) framework. This paper presents the first algorithm guaranteed to identify a near-optimal policy in a robust constrained MDP (RCMDP), where an optimal policy minimizes cumulative cost while satisfying constraints in the worst-case scenario across a set of environments.