Revisiting Value Iteration: Unified Analysis of Discounted and Average-Reward Cases

ArXi:2510.23914v2 Announce Type: replace While Value Iteration (VI) is one of the most fundamental algorithms in Reinforcement Learning, its theoretical convergence guarantees still exhibit a persistent mismatch with empirical behavior. In the discounted-reward case, classical theory guarantees geometric convergence with rate $\gamma$, while in the average-reward case recent work suggests that only sublinear convergence can be expected. In practice, however, VI is often observed to converge significantly faster.