Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features

ArXi:2409.12135v3 Announce Type: replace-cross Temporal difference (TD) learning with linear function approximation (linear TD) is a classic and powerful prediction algorithm in reinforcement learning. While it is well-understood that linear TD converges almost surely to a unique point, this convergence traditionally requires the assumption that the features used by the approximator are linearly independent. However, this linear independence assumption does not hold in many practical scenarios.