Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions

ArXi:2605.01752v1 Announce Type: new We study linear dueling bandits in volatile environments characterized by the simultaneous presence of post-serving contexts, delayed feedback, and adversarial corruption. Feedback is subject to unknown stochastic or adversarial delays and a cumulative corruption budget $\mathcal{C}$. To address these challenges, we propose \term, which integrates a learned approximator that predicts post-serving contexts from pre-serving information.