Model-Based Reinforcement Learning with Double Oracle Efficiency in Policy Optimization and Offline Estimation

ArXi:2605.00393v1 Announce Type: new Reinforcement learning (RL) in large environments often suffers from severe computational bottlenecks, as conventional regret minimization algorithms require repeated, costly calls to planning and statistical estimation oracles. While recent advances have explored offline oracle-efficient algorithms, their computational complexity typically scales with the cardinality of the state and action spaces, rendering them intractable for large-scale or continuous environments.