SpatiaLab: Can Vision-Language Models Perform Spatial Reasoning in the Wild?

ArXi:2602.03916v3 Announce Type: replace-cross Spatial reasoning is a fundamental aspect of human cognition, yet it remains a major challenge for contemporary vision-language models (VLMs). Prior work largely relied on synthetic or LLM-generated environments with limited task designs and puzzle-like setups, failing to capture the real-world complexity, visual noise, and diverse spatial relationships that VLMs encounter. To address this, we