CARE What Fails: Contrastive Anchored-REflection for Verifiable Multimodal

ArXi:2512.19554v3 Announce Type: replace-cross Group-relative reinforcement learning with verifiable rewards (RLVR) often wastes the most informative data it already has the failures. When all rollouts are wrong, gradients stall; when one happens to be correct, the update usually ignores why the others are close-but-wrong, and credit can be misassigned to spurious chains. We present CARE (Contrastive Anchored REflection), a failure-centric post-