Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration

ArXi:2512.23927v2 Announce Type: replace-cross Fitted $Q$-iteration (FQI) and soft FQI are widely used value-based methods for offline reinforcement learning, but their standard stability guarantees often depend on Bellman completeness, a strong closure condition that can fail under function approximation. We analyze soft FQI without Bellman completeness and identify the stability mechanism that replaces it: local stationary norm alignment.