Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs

ArXi:2605.08053v1 Announce Type: new Reinforcement learning (RL) for exponential-utility optimization in discounted Marko decision processes (MDPs) lacks principled value-based algorithms. We address this gap in the fixed risk-aversion setting. Building on the Bellman-type equation for exponential utility studied in \cite{porteus1975optimality}, we derive two Q-value-style extensions and show that the associated operators are contractions in the $L_\infty$ and sup-log/Thompson metrics, respectively.