SARVLM: A Vision Language Foundation Model for Semantic Understanding in SAR Imagery

ArXi:2510.22665v3 Announce Type: replace-cross Synthetic Aperture Radar (SAR) is a critical imaging modality due to its all-weather operational capability. Although recent advances in self-supervised learning and masked image modeling (MIM) have enabled SAR foundation models, these approaches primarily focus on low-level visual features and often neglect multi-modal representation. Moreover, multimodal data for SAR is scarce, limiting the development of robust cross-modal models.