Continuous-time reinforcement learning: ellipticity enables model-free value function approximation

ArXi:2602.06930v2 Announce Type: replace We study off-policy reinforcement learning for controlling continuous-time Marko diffusion processes with discrete-time observations and actions. We consider model-free algorithms with function approximation that learn value and advantage functions directly from data, without unrealistic structural assumptions on the dynamics. Leveraging the ellipticity of the diffusions, we establish a new class of Hilbert-space positive definiteness and boundedness properties for the Bellman operators.