DiPRL: Learning Discrete Programmatic Policies via Architecture Entropy Regularization

ArXi:2605.18508v1 Announce Type: new Programmatic reinforcement learning (PRL) offers an interpretable alternative to deep reinforcement learning by representing policies as human-readable and -editable programs. While gradient-based methods have been developed to optimize continuous relaxations of programs, they face a significant performance drop when converting the continuous relaxations back into discrete programs.