AI RESEARCH

Backprop-free Pong: PC + distributional Hebbian plasticity vs. PPO: 57% vs. 59%, ~1500 lines from scratch [P]

r/MachineLearning

Wanted to see how close a fully bio-plausible agent could get to PPO on Pong. Setup Custom Pong environment (pygame, no gym) PPO baseline: paper-faithful, from scratch Hebbian agent: PPO policy replaced with Hebbian value estimation engineered features → 61% BioAgent: Predictive Coding for feature learning + distributional Hebbian plasticity for value (Dabney 2020) → 57% Zero backprop anywhere in the pipeline. Key observations The 2% gap is real but small.