AI RESEARCH
Controllable and Verifiable Process Data Synthesis for Process Reward Models
arXiv CS.AI
•
ArXi:2605.02395v1 Announce Type: new Process reward models (PRMs) rely on high-quality process supervision data, yet existing construction methods often provide limited control over error location, error type, and trajectory consistency. We propose a controllable and verifiable framework for synthesizing process supervision data for PRMs. Our framework first constructs a correct symbolic reasoning chain, injects a template-aware error into an intermediate step, recomputes subsequent steps under the corrupted state, and verifies that the injected step is not derivable from its prefix.