AI RESEARCH

Non-Rectangular Average-Reward Robust MDPs: Optimal Policies and Their Transient Values

arXiv CS.LG

ArXi:2603.00945v3 Announce Type: replace-cross We study non-rectangular robust Marko decision processes under the average-reward criterion, where the ambiguity set couples transition probabilities across states and the adversary commits to a stationary kernel for the entire horizon.