Bayesian Learning in Episodic Zero-Sum Games

ArXi:2603.20604v1 Announce Type: new We study Bayesian learning in episodic, finite-horizon zero-sum Marko games with unknown transition and reward models. We investigate a posterior algorithm in which each player maintains a Bayesian posterior over the game model, independently samples a game model at the beginning of each episode, and computes an equilibrium policy for the sampled model. We analyze two settings: (i) Both players use the posterior sampling algorithm, and (ii) Only one player uses posterior sampling while the opponent follows an arbitrary learning algorithm.