R-GTD: A Geometric Analysis of Gradient Temporal-Difference Learning in Singular Regimes

ArXi:2601.20599v2 Announce Type: replace-cross Gradient temporal-difference (GTD) learning algorithms are widely used for off-policy policy evaluation with function approximation. However, existing convergence analyses rely on the restrictive assumption that the so-called feature interaction matrix (FIM) is nonsingular. In practice, the FIM can become singular and leads to instability or degraded performance. While some prior works have applied regularization to relax the nonsingularity assumption, their theoretical guarantees inevitably rely on other restrictive conditions.