Deep Reinforcement Learning and The Tale of Two Temporal Difference Errors

ArXi:2603.21921v1 Announce Type: cross The temporal difference (TD) error was first formalized in Sutton, where it was first characterized as the difference between temporally successive predictions, and later, in that same work, formulated as the difference between a bootstrapped target and a prediction. Since then, these two interpretations of the TD error have been used interchangeably in the literature, with the latter eventually being adopted as the standard critic loss in deep reinforcement learning (RL) architectures.