Finite-Time Analysis of Q-Value Iteration for General-Sum Stackelberg Games

ArXi:2604.04394v1 Announce Type: new Reinforcement learning has been successful both empirically and theoretically in single-agent settings, but extending these results to multi-agent reinforcement learning in general-sum Marko games remains challenging. This paper studies the convergence of Stackelberg Q-value iteration in two-player general-sum Marko games from a control-theoretic perspective. We