Improving Vision-language Models with Perception-centric Process Reward Models

ArXi:2604.24583v1 Announce Type: new Recent advancements in reinforcement learning with verifiable rewards (RLVR) have significantly improved the complex reasoning ability of vision-language models (VLMs). However, its outcome-level supervision is too coarse to diagnose and correct errors within the reasoning chain.