AI RESEARCH
Graph-Dependent Regret Bounds in Multi-Armed Bandits with Interference
arXiv CS.LG
•
ArXi:2503.07555v3 Announce Type: replace We study multi-armed bandits under network interference, where each unit's reward depends on its own treatment and those of its neighbors in a given graph. This induces an exponentially large action space, making standard approaches computationally impractical. We propose a novel algorithm that uses the local graph structure to minimize regret. We derive a graph-dependent upper bound on cumulative regret that improves over prior work.