Detecting and Suppressing Reward Hacking with Gradient Fingerprints

ArXi:2604.16242v1 Announce Type: new Reinforcement learning with verifiable rewards (RLVR) typically optimizes for outcome rewards without imposing constraints on intermediate reasoning. This leaves