AI RESEARCH
Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier
arXiv CS.LG
•
ArXi:2604.15242v1 Announce Type: new We study the problem of learning minimax policies in zero-sum matrix games. Fiegel recently showed that achieving last-iterate convergence in this setting is harder when the players are uncoupled, by proving a lower bound on the exploitability gap of Omega(t^{-1/4}). Some online mirror descent algorithms were proposed in the literature for this problem, but none have truly attained this rate yet. We show that the use of a log-barrier regularization, along with a dual-focused analysis, allows this O-tilde(t^{-1/4}) convergence with high-probability.