AI RESEARCH

On the "Causality" Step in Policy Gradient Derivations: A Pedagogical Reconciliation of Full Return and Reward-to-Go

arXiv CS.AI

ArXi:2604.04686v1 Announce Type: new