Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty

ArXi:2404.12598v2 Announce Type: replace This paper studies continuous-time risk-sensitive reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation with the exponential-form objective. The risk-sensitive objective arises either as the agent's risk attitude or as a distributionally robust approach against the model uncertainty.