Last-Iterate Analyses of FTRL with the 1/2-Tsallis Entropy in Stochastic Bandits

ArXi:2510.22819v2 Announce Type: replace The convergence analysis of online learning algorithms is central to machine learning theory, where the last-iterate convergence is particularly important, as it captures the learner's actual decisions and describes the evolution of the learning process over time. However, in multi-armed bandits, most existing algorithmic analyses mainly focus on the order of regret, while the last-iterate (simple regret) convergence rate remains less explored -- especially for the widely studied Follow-the-Regularized-Leader (FTRL) algorithms.