PS-TTS: Phonetic Synchronization in Text-to-Speech for Achieving Natural Automated Dubbing

ArXi:2604.09111v1 Announce Type: cross Recently, artificial intelligence-based dubbing technology has advanced, enabling automated dubbing (AD) to convert the source speech of a video into target speech in different languages. However, natural AD still faces synchronization challenges such as duration and lip-synchronization (lip-sync), which are crucial for preserving the viewer experience.