Optimistic Actor-Critic with Parametric Policies for Linear Markov Decision Processes

ArXi:2603.28595v1 Announce Type: new Although actor-critic methods have been successful in practice, their theoretical analyses have several limitations. Specifically, existing theoretical work either sidesteps the exploration problem by making strong assumptions or analyzes impractical methods with complicated algorithmic modifications. Moreover, the actor-critic methods analyzed for linear MDPs often employ natural policy gradient (NPG) and construct "implicit" policies without explicit parameterization.