Frictional Q-Learning

ArXi:2509.19771v4 Announce Type: replace Off-policy reinforcement learning suffers from extrapolation errors when a learned policy selects actions that are weakly ed in the replay buffer. In this study, we address this issue by drawing an analogy to static friction. From this perspective, the replay buffer is represented as a smooth, low-dimensional action manifold, where the directions correspond to the tangential component, while the normal component captures the dominant first-order extrapolation error.