Adaptive Scaling of Policy Constraints for Offline Reinforcement Learning

ArXi:2508.19900v2 Announce Type: replace Offline reinforcement learning (RL) enables learning effective policies from fixed datasets without any environment interaction. Existing methods typically employ policy constraints to mitigate the distribution shift encountered during offline RL