Unlearning Offline Stochastic Multi-Armed Bandits

ArXi:2605.00638v1 Announce Type: new Machine unlearning aims to un. Prior work has mainly studied unsupervised / supervised machine unlearning, leaving unlearning for sequential decision-making systems far less understood. We initiate the first study of a foundational sequential decision-making problem: offline stochastic multi-armed bandits (MAB). We formalize the privacy constraint for offline MAB and measure utility by the post-unlearning decision quality.