How Do LLMs and VLMs Understand Viewpoint Rotation Without Vision? An Interpretability Study

ArXi:2604.15294v1 Announce Type: new Over the past year, spatial intelligence has drawn increasing attention. Many prior works study it from the perspective of visual-spatial intelligence, where models have access to visuospatial information from visual inputs. However, in the absence of visual information, whether linguistic intelligence alone is sufficient to endow models with spatial intelligence, and how models perform relevant tasks with text-only inputs still remain unexplored.