OMHBench: Benchmarking Balanced and Grounded Omni-Modal Multi-Hop Reasoning

ArXi:2508.16198v3 Announce Type: replace Multimodal Large Language Models (MLLMs) have increasingly ed omni-modal processing across text, vision, and speech. However, existing evaluation frameworks for such models suffer from critical limitations, including modality shortcuts and biased reasoning paths. To address these challenges, we propose OMHBench, a novel benchmark designed to rigorously evaluate omni-modal multi-hop reasoning. It consists of 6,144 questions with balanced reasoning paths that are jointly grounded across all three modalities.