Provable Distributional Value Iteration under Partial Observability

ArXi:2505.06518v3 Announce Type: replace In many real-world planning tasks, agents must tackle uncertainty about the environment's state and variability in the outcomes induced by stochastic dynamics and rewards. Motivated by recent progress in world model approaches, where latent models approximate beliefs and planning, we extend Distributional Reinforcement Learning (DistRL), which models the entire return distribution for fully observable domains, to Partially Observable Marko Decision Processes (POMDPs). Concretely, we.