An LP-based Sampling Policy for Multi-Armed Bandits with Side-Observations and Stochastic Availability

ArXi:2603.26647v1 Announce Type: new We study the stochastic multi-armed bandit (MAB) problem where an underlying network structure enables side-observations across related actions. We use a bipartite graph to link actions to a set of unknowns, such that selecting an action reveals observations for all the unknowns it is connected to. While previous works rely on the assumption that all actions are permanently accessible, we investigate the practical setting of stochastic availability, where the set of feasible actions (the "activation set") varies dynamically in each round.