On the Importance of Multistability for Horizon Generalization in Reinforcement Learning

ArXi:2605.12206v1 Announce Type: new In reinforcement learning (RL), agents acting in partially observable Marko decision processes (POMDPs) must rely on memory, typically encoded in a recurrent neural network (RNN), to integrate information from past observations. Long-horizon POMDPs, in which the relevant observation and the optimal action are separated by many time steps (called the horizon), are particularly challenging