AI RESEARCH
Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information
arXiv CS.LG
•
ArXi:2502.00204v3 Announce Type: replace We study the problem of online learning in Stackelberg games with side information between a leader and a sequence of followers. In every round the leader observes contextual information and commits to a mixed strategy, after which the follower best-responds. We provide learning algorithms for the leader which achieve $O(T^{1/2})$ regret under bandit feedback, an improvement from the previously best-known rates of $O(T^{2/3