MAG-VLAQ: Multi-modal Aerial-Ground Query Aggregation for Cross-View Place Recognition

ArXi:2605.09418v1 Announce Type: new Multi-modal cross-view place recognition remains a fundamental challenge in computer vision and robotics due to the severe viewpoint, modality, and spatial-structure discrepancies between ground observations and aerial references. To address this challenge, we present MAG-VLAQ, a foundation-model-enhanced query aggregation framework for multi-modal aerial-ground cross-view place recognition.