AI RESEARCH
Persistent-Transient Policy Evaluation for Markov Chains via Minimal Peripheral Quotients
arXiv CS.LG
•
ArXi:2602.00474v2 Announce Type: replace-cross We study fixed-policy evaluation for finite Marko chains that may be reducible and periodic. Classical evaluation methods with gain and bias decomposition are not always diagnostic: the gain records only invariant Ces\`aro averages, while persistent phase-dependent behavior is absorbed into the bias together with genuinely transient effects. We identify the real peripheral invariant subspace $\mathcal{K}(P)$ of the transition matrix $P$ as the source of this ambiguity.