Evaluation-Aware Reinforcement Learning

ArXi:2509.19464v3 Announce Type: replace Policy evaluation is a core component of many reinforcement learning (RL) algorithms and a critical tool for ensuring safe deployment of RL policies. However, existing policy evaluation methods often suffer from high variance or bias. To address these issues, we