VocSegMRI: Multimodal Learning for Precise Vocal Tract Segmentation in Real-time MRI

ArXi:2509.13767v3 Announce Type: replace Accurately segmenting articulatory structures in real-time magnetic resonance imaging (rtMRI) remains challenging, as most existing methods rely almost entirely on visual cues. Yet synchronized acoustic and phonological signals provide complementary context that can enrich visual information and improve precision. In this paper, we