A Measure-Theoretic Finite-Sample Theory for Adaptive-Data Fitted Q-Iteration

ArXi:2605.05791v1 Announce Type: new While reinforcement learning (RL) promises to revolutionize the control of complex nonlinear robotic systems, a profound gap persists between the heuristic success of model-free off-policy deep RL and the underlying theory, which remains largely confined to tabular or linearizable settings.