COOPO: Cyclic Offline-Online Policy Optimization Algorithm

ArXi:2605.18675v1 Announce Type: new Offline reinforcement learning struggles with distributional shift and constrained performance due to static dataset limitations, while online RL demands prohibitive environment interactions. The recent advent of hybrid offline-to-online methods bridges these domains but suffers from distribution drift during transitions and catastrophic forgetting of offline knowledge. We