AI RESEARCH
Non-Rectangular Average-Reward Robust MDPs: Optimal Policies and Their Transient Values
arXiv CS.LG
•
ArXi:2603.00945v3 Announce Type: replace-cross We study non-rectangular robust Marko decision processes under the average-reward criterion, where the ambiguity set couples transition probabilities across states and the adversary commits to a stationary kernel for the entire horizon.