AI RESEARCH

Regret minimization in Linear Bandits with offline data via extended D-optimal exploration

arXiv CS.LG

ArXi:2508.08420v3 Announce Type: replace We consider the problem of online regret minimization in linear bandits with access to prior observations (offline data) from the underlying bandit model. There are numerous applications where extensive offline data is often available, such as in recommendation systems, online advertising. Consequently, this problem has been studied intensively in recent literature. Our algorithm, Offline-Online Phased Elimination (OOPE), effectively incorporates the offline data to substantially reduce the online regret compared to prior work.