AI RESEARCH

Optimal Regret for Single Index Bandits

arXiv CS.LG

ArXi:2605.09454v1 Announce Type: cross We study the $\textit{single-index bandit}$ problem, where rewards depend on an unknown one-dimensional projection of high-dimensional contexts through an unknown reward function. This model extends linear and generalized linear bandits to a nonparametric setting, and is particularly relevant when the reward function is not known in advance.