Thompson Sampling for Infinite-Horizon Discounted Decision Processes

ArXi:2405.08253v3 Announce Type: replace-cross This paper develops a viable notion of learning for sampling-based algorithms that applies in broader settings than previously considered. specifically, we model a discounted infinite-horizon MDPs with Borel state and action spaces, whose rewards and transitions depend on an unknown parameter. To analyze adaptive learning algorithms based on sampling we