MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation

ArXi:2604.17435v1 Announce Type: new Recent Speech-to-Speech Translation (S2ST) systems achieve strong semantic accuracy yet consistently strip away non-verbal vocalizations (NVs), such as laughter and crying that convey pragmatic intent, which severely limits real-world utility. We address this via three contributions. First, we propose a synthesis pipeline for building scalable expressive datasets to overcome the data scarcity limitation.