Multi-Armed Bandits With Best-Action Queries

ArXi:2605.08287v1 Announce Type: cross We study \emph{multi-armed bandits} (MABs) augmented with \emph{best-action queries}, in which the learner may. additionally. query an oracle that reveals the best arm in the current round. This setting was recently characterized by Russo in the \emph{full-feedback} model, where the learner observes the rewards of all arms after each round.