A Nash Equilibrium Framework For Training-Free Multimodal Step Verification

ArXi:2605.20033v1 Announce Type: new Multimodal large language models often generate reasoning chains containing subtle errors that lead to incorrect answers. Current verification approaches have notable limitations. Learned critics need extensive labeled data and show inconsistent performance across different tasks. Meanwhile, existing