AI RESEARCH
QuantFPFlow: Quantum Amplitude Estimation for Fokker--Planck Policy Optimisation in Continuous Reinforcement Learning
arXiv CS.LG
•
ArXi:2605.16429v1 Announce Type: new The estimated stationary distribution $\rhostar$ drives a theoretically grounded exploration bonus $\Raug = \Ren + \alpha\log(1/\rhostar(s))$. This bonus steers the agent toward globally optimal regions of multimodal reward landscapes while simultaneously cons