AI RESEARCH

Online learning with Erd\H{o}s-R\'enyi side-observation graphs

arXiv CS.LG

ArXi:2604.25271v1 Announce Type: cross We consider adversarial multi-armed bandit problems where the learner is allowed to observe losses of a number of arms beside the arm that it actually chose. We study the case where all non-chosen arms reveal their loss with a fixed but unknown probability $r$, independently of each other and the action of the learner. We propose two algorithms that work for different ranges of $r