AI RESEARCH

Persistent-Transient Policy Evaluation for Markov Chains via Minimal Peripheral Quotients

arXiv CS.LG

ArXi:2602.00474v2 Announce Type: replace-cross We study fixed-policy evaluation for finite Marko chains that may be reducible and periodic. Classical evaluation methods with gain and bias decomposition are not always diagnostic: the gain records only invariant Ces\`aro averages, while persistent phase-dependent behavior is absorbed into the bias together with genuinely transient effects. We identify the real peripheral invariant subspace $\mathcal{K}(P)$ of the transition matrix $P$ as the source of this ambiguity.