AI RESEARCH
Unleashing Implicit Rewards: Prefix-Value Learning for Distribution-Level Optimization
arXiv CS.CL
•
ArXi:2604.13197v1 Announce Type: new Process reward models (PRMs) provide fine-grained reward signals along the reasoning process, but