Good Scores, Bad Data: A Metric for Multimodal Coherence

ArXi:2603.25924v1 Announce Type: cross Multimodal AI systems are evaluated by downstream task accuracy, but high accuracy does not mean the underlying data is coherent. A model can score well on Visual Question Answering (VQA) while its inputs contradict each other. We