Addressing Finite-Horizon MDPs via Low-Rank Tensor Value Approximation

ArXi:2501.10598v3 Announce Type: replace We study the problem of learning optimal policies in finite-horizon Marko Decision Processes (MDPs) using low-rank reinforcement learning (RL) methods. In finite-horizon MDPs, the policies, and therefore the value functions (VFs) are not stationary. This aggravates the challenges of high-dimensional MDPs, as they suffer from the curse of dimensionality and high sample complexity.