Learning Weakly Communicating Average-Reward CMDPs: Strong Duality and Improved Regret

ArXi:2605.11586v1 Announce Type: new We study infinite-horizon average-reward constrained Marko decision processes (CMDPs) under the weakly communicating assumption. Our contributions are twofold. First, we establish strong duality for weakly communicating average-reward CMDPs over stationary policies with finite state and action spaces. Despite the absence of a linear programming formulation and the resulting nonconvexity under the weakly communicating setting, we show that strong duality still holds by carefully exploiting the geometric structure of the occupation measure set.