Efficient Process Reward Modeling via Contrastive Mutual Information

ArXi:2604.10660v1 Announce Type: cross Recent research has devoted considerable effort to verifying the intermediate reasoning steps of chain-of-thought (CoT) trajectories using process reward models (PRMs) and other verifier models. However