AI RESEARCH
SOPE: Stabilizing Off-Policy Evaluation for Online RL with Prior Data
arXiv CS.LG
•
ArXi:2605.05863v1 Announce Type: new Incorporating prior data into online reinforcement learning accelerates