AI RESEARCH

PSP: An Interpretable Per-Dimension Accent Benchmark for Indic Text-to-Speech

arXiv CS.CL

ArXi:2604.25476v1 Announce Type: cross Standard text-to-speech (TTS) evaluation measures intelligibility (WER, CER) and overall naturalness (MOS, UTMOS) but does not quantify accent. A synthesiser may score well on all four yet sound non-native on features that are phonemic in the target language. For Indic languages, these features include retroflex articulation, aspiration, vowel length, and the Tamil retroflex approximant (letter zha). We present PSP, the Phoneme Substitution Profile, an interpretable, per-phonological-dimension accent benchmark for Indic.