Global Cross-Modal Geo-Localization: A Million-Scale Dataset and a Physical Consistency Learning Framework

ArXi:2603.08491v1 Announce Type: new Cross-modal Geo-localization (CMGL) matches ground-level text descriptions with geo-tagged aerial imagery, which is crucial for pedestrian navigation and emergency response. However, existing researches are constrained by narrow geographic coverage and simplistic scene diversity, failing to reflect the immense spatial heterogeneity of global architectural styles and topographic features. To bridge this gap and facilitate universal positioning, we