Vision Verification Enhanced Fusion of VLMs for Efficient Visual Reasoning

ArXi:2603.12669v1 Announce Type: cross With the growing number and diversity of Vision-Language Models (VLMs), many works explore language-based ensemble, collaboration, and routing techniques across multiple VLMs to improve multi-model reasoning. In contrast, we address the diverse model selection using both vision and language modalities. We