AI RESEARCH

Efficient learning by implicit exploration in bandit problems with side observations

arXiv CS.LG

ArXi:2604.24555v1 Announce Type: new We consider online learning problems under a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that in addition to its own loss, the learner also gets to observe losses of some other actions. The revealed losses depend on the learner's action and a directed observation system chosen by the environment.