QuantFPFlow: Quantum Amplitude Estimation for Fokker--Planck Policy Optimisation in Continuous Reinforcement Learning

ArXi:2605.16429v1 Announce Type: new The estimated stationary distribution $\rhostar$ drives a theoretically grounded exploration bonus $\Raug = \Ren + \alpha\log(1/\rhostar(s))$. This bonus steers the agent toward globally optimal regions of multimodal reward landscapes while simultaneously cons