AdaGamma: State-Dependent Discounting for Temporal Adaptation in Reinforcement Learning

ArXi:2605.06149v1 Announce Type: new The discount factor in reinforcement learning controls both the effective planning horizon and the strength of bootstrapping, yet most deep RL methods use a single fixed value across all states. While state-dependent discounting is conceptually appealing, naive deep actor--critic implementations can become unstable and degenerate toward TD-error collapse.