AI RESEARCH
Extreme bandits
arXiv CS.LG
•
ArXi:2604.24545v1 Announce Type: cross In many areas of medicine, security, and life sciences, we want to allocate limited resources to different sources in order to detect extreme values. In this paper, we study an efficient way to allocate these resources sequentially under limited feedback. While sequential design of experiments is well studied in bandit theory, the most commonly optimized property is the regret with respect to the maximum mean reward.