TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Framework for \"U-Tsang, Amdo and Kham Speech Dataset Generation

ArXi:2509.18060v2 Announce Type: replace Tibetan is a low-resource language with limited parallel speech corpora spanning its three major dialects (\"U-Tsang, Amdo, and Kham), limiting progress in speech modeling. To address this issue, we propose TMD-TTS, a unified Tibetan multi-dialect text-to-speech (TTS) framework that synthesizes parallel dialectal speech from explicit dialect labels. Our method features a dialect fusion module and a Dialect-Specialized Dynamic Routing Network (DSDR-Net) to capture fine-grained acoustic and linguistic variations across dialects.