Finite-State Controllers for (Hidden-Model) POMDPs using Deep Reinforcement Learning

ArXi:2602.08734v2 Announce Type: replace Solving partially observable Marko decision processes (POMDPs) requires computing policies under imperfect state information. Despite recent advances, the scalability of existing POMDP solvers remains limited. Moreover, many settings require a policy that is robust across multiple POMDPs, further aggravating the scalability issue. We propose the Lexpop framework for POMDP solving.