Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration

ArXi:2506.01250v2 Announce Type: replace In this paper, we address the contextual dueling bandit problem by proposing variance-aware algorithms that leverage neural networks to approximate nonlinear utility functions. Our approach employs a \textit{variance-aware exploration strategy}, which adaptively accounts for uncertainty in pairwise comparisons while relying only on the gradients with respect to the learnable parameters of the last layer.