SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks

ArXi:2506.14512v4 Announce Type: replace Large Language Models (LLMs) have undergone rapid progress, largely attributed to reinforcement learning on complex reasoning tasks. In contrast, while spatial intelligence is fundamental for Vision-Language Models (VLMs) in real-world interaction, the systematic study of their complex spatial reasoning remains underexplored. To bridge this gap, we