AI RESEARCH

When Errors Can Be Beneficial: A Categorization of Imperfect Rewards for Policy Gradient

arXiv CS.LG

ArXi:2604.25872v1 Announce Type: new