Model-based Bootstrap of Controlled Markov Chains

ArXi:2605.12410v1 Announce Type: cross We propose and analyze a model-based bootstrap for transition kernels in finite controlled Marko chains (CMCs) with possibly nonstationary or history-dependent control policies, a setting that arises naturally in offline reinforcement learning (RL) when the behavior policy generating the data is unknown. We establish distributional consistency of the bootstrap transition estimator in both a single long-chain regime and the episodic offline RL regime.