AI RESEARCH
Optimal Regret for Single Index Bandits
arXiv CS.LG
•
ArXi:2605.09454v1 Announce Type: cross We study the $\textit{single-index bandit}$ problem, where rewards depend on an unknown one-dimensional projection of high-dimensional contexts through an unknown reward function. This model extends linear and generalized linear bandits to a nonparametric setting, and is particularly relevant when the reward function is not known in advance.