Breaking the Computational Barrier: Provably Efficient Actor-Critic for Low-Rank MDPs

ArXi:2605.01242v1 Announce Type: new Reinforcement learning (RL) is a fundamental framework for sequential decision-making, in which an agent learns an optimal policy through interactions with an unknown environment. In settings with function approximation, many existing RL algorithms achieve favorable sample complexity, but often rely on computationally intractable oracles. In this paper, we use supervised learning as a computational proxy to establish a clear hierarchy of commonly adopted RL oracles under low-rank Marko Decision Processes (MDPs.