AI RESEARCH

Conformal-Style Quantile Analyses for Stochastic Bandits

arXiv CS.LG

ArXi:2605.07115v1 Announce Type: new Stochastic bandit algorithms are usually analyzed under a mean-reward criterion, yet many problems favor arms with strong upper-tail performance, which we study herein. For a fixed miscoverage level \(\alpha\), the natural upper-tail target of arm \(j\) is the upper endpoint \(F_j^{-1}(1-\alpha/2)\) of a central prediction interval. This target can rank arms differently from their means, creating a central mismatch with the classical bandit objective.