More than the Sum: Panorama-Language Models for Adverse Omni-Scenes

ArXi:2603.09573v1 Announce Type: new Existing vision-language models (VLMs) are tailored for pinhole imagery, stitching multiple narrow field-of-view inputs to piece together a complete omni-scene understanding. Yet, such multi-view perception overlooks the holistic spatial and contextual relationships that a single panorama inherently preserves. In this work, we