AI RESEARCH
Execution-Grounded Credit Assignment for GRPO in Code Generation
arXiv CS.LG
•
ArXi:2603.16158v1 Announce Type: new Critic-free reinforcement learning with verifiable rewards (RLVR) improves code generation by optimizing unit-test pass rates, but GRPO-style updates suffer from coarse credit assignment: a single outcome signal is spread uniformly across long programs even when failure stems from a localized semantic error. We propose Execution-Grounded Credit Assignment (EGCA), which localizes GRPO updates using execution traces.