AI RESEARCH
Does This Gradient Spark Joy?
arXiv CS.AI
•
ArXi:2603.20526v1 Announce Type: cross Policy gradient computes a backward pass for every sample, even though the backward pass is expensive and most samples carry little learning value. The Delightful Policy Gradient (DG) provides a forward-pass signal of learning value: \emph{delight}, the product of advantage and surprisal (negative log-probability). We