SB-TRPO: Towards Safe Reinforcement Learning with Hard Constraints

ArXi:2512.23770v3 Announce Type: replace-cross In safety-critical domains, reinforcement learning (RL) agents must often satisfy strict, zero-cost safety constraints while accomplishing tasks. Existing model-free methods frequently either fail to achieve near-zero safety violations or become overly conservative. We