AI RESEARCH

Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis

arXiv CS.CL

ArXi:2601.13802v2 Announce Type: replace Arabic spans over 30 spoken varieties, yet no open-source text-to-speech system unifies them. Key barriers include substantial cross-dialect lexical and phonological divergence, scarce synthesis-grade data, and the absence of a standardized multi-dialect evaluation benchmark. We present Habibi, a unified-dialectal Arabic TTS framework that addresses all three. Through a multi-step curation pipeline, we repurpose open-source ASR corpora into TTS