From Cumulative Constraints to Adaptive Runtime Safety Control for Nonstationary Reinforcement Learning

ArXi:2605.18841v1 Announce Type: new Safety in reinforcement learning is often specified through cumulative cost constraints, but these trajectory-level guarantees do not directly prevent unsafe individual decisions, especially under nonstationarity. In continual and nonstationary settings, the difficulty is amplified because the risk associated with the same action can vary across contexts, while a fixed state-level threshold may be either too conservative or too weak.