Sample-efficient Neuro-symbolic Proximal Policy Optimization

ArXi:2604.25534v1 Announce Type: new Deep Reinforcement Learning (DRL) algorithms often require a large amount of data and struggle in sparse-reward domains with long planning horizons and multiple sub-goals. In this paper, we propose a neuro-symbolic extension of Proximal Policy Optimization (PPO) that transfers partial logical policy specifications learned in easier instances to guide learning in challenging settings. We