Unichain and Aperiodicity are Sufficient for Asymptotic Optimality of Average-Reward Restless Bandits

ArXi:2402.05689v4 Announce Type: replace We consider the infinite-horizon, average-reward restless bandit problem in discrete time. We propose a new class of policies that are designed to drive a progressively larger subset of arms toward the optimal distribution. We show that our policies are asymptotically optimal with an $O(1/\sqrt{N})$ optimality gap for an $N$-armed problem, assuming only a unichain and aperiodicity assumption.