Central Limit Theorem for Two-Time-Scale Approximate Distributionally Robust RL

ArXi:2605.08417v1 Announce Type: new Designing model-free algorithms for distributionally robust reinforcement learning (DRRL) poses fundamental challenges. The robust Bellman operator is nonlinear in the transition kernel, which makes one-sample Bellman updates biased, while the adversarial optimization underlying robustness makes robust evaluation computationally demanding.