AI RESEARCH
When Errors Can Be Beneficial: A Categorization of Imperfect Rewards for Policy Gradient
arXiv CS.LG
•
ArXi:2604.25872v1 Announce Type: new