M2-Verify: A Large-Scale Multidomain Benchmark for Checking Multimodal Claim Consistency

ArXi:2604.01306v2 Announce Type: replace Evaluating scientific arguments requires assessing the strict consistency between a claim and its underlying multimodal evidence. However, existing benchmarks lack the scale, domain diversity, and visual complexity needed to evaluate this alignment realistically. To address this gap, we