Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification

ArXi:2507.11662v3 Announce Type: replace-cross Verifiers--functions assigning rewards to agent behavior--have been key to AI progress in math, code, and games. However, extending gains to domains without clear-cut success criteria remains a challenge: while humans can recognize desired outcomes, translating this intuition into scalable rules is nontrivial. Multimodal LLMs (MLLMs) offer a promising solution, given their world knowledge, human-preference alignment, and reasoning capabilities.