CARV: A Diagnostic Benchmark for Compositional Analogical Reasoning in Multimodal LLMs

ArXi:2603.27958v1 Announce Type: new Analogical reasoning tests a fundamental aspect of human cognition: mapping the relation from one pair of objects to another. Existing evaluations of this ability in multimodal large language models (MLLMs) overlook the ability to compose rules from multiple sources, a critical component of higher-order intelligence. To close this gap, we