SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition

ArXi:2511.21471v3 Announce Type: replace Spatial cognition is fundamental to real-world multimodal intelligence, allowing models to effectively interact with the physical environment. While multimodal large language models (MLLMs) have made significant strides, existing benchmarks often oversimplify spatial cognition, reducing it to a single-dimensional metric, which fails to capture the hierarchical structure and interdependence of spatial abilities.