AI RESEARCH

Adaptive Replay Buffer for Offline-to-Online Reinforcement Learning

arXiv CS.LG

ArXi:2512.10510v2 Announce Type: replace Offline-to-Online Reinforcement Learning (O2O RL) faces a critical dilemma in balancing the use of a fixed offline dataset with newly collected online experiences. Standard methods, often relying on a fixed data-mixing ratio, struggle to manage the trade-off between early learning stability and asymptotic performance. To overcome this, we