AI RESEARCH
On the "Causality" Step in Policy Gradient Derivations: A Pedagogical Reconciliation of Full Return and Reward-to-Go
arXiv CS.AI
•
ArXi:2604.04686v1 Announce Type: new