StepCodeReasoner: Aligning Code Reasoning with Stepwise Execution Traces via Reinforcement Learning

ArXi:2605.11922v1 Announce Type: cross Existing code reasoning methods primarily supervise final code outputs, ignoring intermediate states, often leading to reward hacking where correct answers are obtained through inconsistent reasoning. We propose StepCodeReasoner, a framework that