C2: Scalable Rubric-Augmented Reward Modeling from Binary Preferences

ArXi:2604.13618v1 Announce Type: cross Rubric-augmented verification guides reward models with explicit evaluation criteria, yielding reliable judgments than single-model verification. However, most existing methods require costly rubric annotations, limiting scalability. Moreover, we find that rubric generation is vulnerable to a failure of cooperation; low-quality rubrics actively mislead reward models rather than help.