ISEP: Implicit Support Expansion for Offline Reinforcement Learning via Stochastic Policy Optimization

ArXi:2605.18320v1 Announce Type: new Offline reinforcement learning methods typically enforce strict constraints to ensure safety; yet this rigidity often prevents the discovery of optimal behaviors outside the immediate of the behavior policy. To address this, we propose Implicit Expansion via stochastic Policy optimization (ISEP), which leverages a value function interpolated between in-distribution data and policy samples to implicitly expand the feasible action.