SPATIOROUTE: Dynamic Prompt Routing for Zero-Shot Spatial Reasoning

ArXi:2605.18209v1 Announce Type: new Spatial question answering over egocentric video is a challenging task that requires Vision-Language Models (VLMs) to reason about 3D object positions, scene affordances, and directional relationships, particularly in the zero-shot setting where no task-specific fine-tuning is available. We