Policy Gradient Methods for Non-Markovian Reinforcement Learning

ArXi:2605.10816v1 Announce Type: cross We study policy gradient methods for reinforcement learning in non-Markovian decision processes (NMDPs), where observations and rewards depend on the entire interaction history. To handle this dependence, the agent maintains an internal state that is recursively updated to provide a compact summary of past observations and actions.