Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO

ArXi:2604.13517v1 Announce Type: cross Temporal credit assignment in reinforcement learning has long been a central challenge. Inspired by the multi-timescale encoding of the dopamine system in neurobiology, recent research has sought to