AI RESEARCH
Happiness is Sharing a Vocabulary: A Study of Transliteration Methods
arXiv CS.AI
•
ArXi:2510.10827v2 Announce Type: replace-cross Transliteration has emerged as a promising means to bridge the gap between various languages in multilingual NLP, showing promising results especially for languages using non-Latin scripts. We investigate the degree to which shared script, overlapping token vocabularies, and shared phonology contribute to performance of multilingual models. To this end, we conduct controlled experiments using three kinds of transliteration (romanization, phonemic transcription, and substitution ciphers) as well as orthography.