Lost in Space? Vision-Language Models Struggle with Relative Camera Pose Estimation

ArXi:2601.22228v2 Announce Type: replace-cross We study whether vision-language models (VLMs) can solve relative camera pose estimation (RCPE) from image pairs, a direct test of multi-view spatial reasoning. We cast RCPE as a discrete verbal classification task and