Beyond Importance Sampling: Rejection-Gated Policy Optimization

ArXi:2604.14895v1 Announce Type: cross We propose a new perspective on policy optimization: rather than reweighting all samples by their importance ratios, an optimizer should select which samples are trustworthy enough to drive a policy update. Building on this view, we