Offline Two-Player Zero-Sum Markov Games with KL Regularization

ArXi:2605.13025v1 Announce Type: new We study the problem of learning Nash equilibria in offline two-player zero-sum Marko games. While existing approaches often rely on explicit pessimism to address distribution shift, we show that KL regularization alone suffices to stabilize learning and guarantee convergence. We first