AI RESEARCH

Ara-Best-RQ: Multi Dialectal Arabic SSL

arXiv CS.CL

ArXi:2603.21900v1 Announce Type: new We present Ara-BEST-RQ, a family of self-supervised learning (SSL) models specifically designed for multi-dialectal Arabic speech processing. Leveraging 5,640 hours of crawled Creative Commons speech and combining it with publicly available datasets, we pre-train conformer-based BEST-RQ models up to 600M parameters. Our models are evaluated on dialect identification (DID) and automatic speech recognition (ASR) tasks, achieving state-of-the-art performance on the former while using fewer parameters than competing models. We nstrate that family-targeted pre.