Process Reward Models Meet Planning: Generating Precise and Scalable Datasets for Step-Level Rewards

ArXi:2604.17957v1 Announce Type: new Process Reward Models (PRMs) have emerged as a powerful tool for providing step-level feedback when evaluating the reasoning of Large Language Models (LLMs), which frequently produce chains of thought (CoTs) containing errors even when the final answer is correct. However, existing PRM datasets remain expensive to construct, prone to annotation errors, and predominantly limited to the mathematical domain. This work