Near-Optimal Last-Iterate Convergence for Zero-Sum Games with Bandit Feedback and Opponent Actions

ArXi:2605.09363v1 Announce Type: new Last-iterate convergence of learning dynamics in games has attracted significant recent attention. In two-player zero-sum games with bandit feedback, where only the loss of the selected action pair is observed, Fiegel show a separation between average-iterate and last-iterate convergence in duality gap: while the optimal t^(-1/2) rate after t rounds is achievable for the former via standard no-regret algorithms, the latter cannot converge faster than t^(-1/3) in expectation or t^(-1/4) with high probability.