PDCR: Perception-Decomposed Confidence Reward for Vision-Language Reasoning

ArXi:2605.13467v1 Announce Type: new Reinforcement Learning with Verifiable Rewards (RLVR) traditionally relies on a sparse, outcome-based signal. Recent work shows that providing a fine-grained, model-intrinsic signal (rewarding the confidence growth in the ground-truth answer) effectively improves language reasoning