Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria

ArXi:2605.08354v1 Announce Type: new Aligning multimodal generative models with human preferences demands reward signals that respect the compositional, multi-dimensional structure of human judgment. Prevailing RLHF approaches reduce this structure to scalar or pairwise labels, collapsing nuanced preferences into opaque parametric proxies and exposing vulnerabilities to reward hacking.