Provably Efficient Offline-to-Online Value Adaptation with General Function Approximation

ArXi:2604.13966v1 Announce Type: new We study value adaptation in offline-to-online reinforcement learning under general function approximation. Starting from an imperfect offline pretrained $Q$-function, the learner aims to adapt it to the target environment using only a limited amount of online interaction. We first characterize the difficulty of this setting by establishing a minimax lower bound, showing that even when the pretrained $Q$-function is close to optimal $Q^\star$, online adaptation can be no efficient than pure online RL on certain hard instances.