Active Context Selection Improves Simple Regret in Contextual Bandits

ArXi:2605.20040v1 Announce Type: new We study the contextual multi-armed bandit problem with a finite context space (a.k.a. subpopulations), where the learner recommends a best action for each context and is evaluated by context-weighted simple regret. Our guarantees are worst-case over the reward distributions, while remaining instance-dependent with respect to the context distribution vector $p