AI RESEARCH
Efficient PRM Training Data Synthesis via Formal Verification
arXiv CS.CL
•
ArXi:2505.15960v3 Announce Type: replace Process Reward Models (PRMs) have emerged as a promising approach for improving LLM reasoning capabilities by providing process supervision over reasoning traces. However, existing approaches for constructing PRM