AI RESEARCH
Languages in Whisper-Style Speech Encoders Align Both Phonetically and Semantically
arXiv CS.CL
•
ArXi:2505.19606v2 Announce Type: replace Cross-lingual alignment in pretrained language models enables knowledge transfer across languages. Similar alignment has been reported in Whisper-style speech encoders, based on spoken translation retrieval using representational similarity. However, prior work does not control for phonetic overlap between equivalent utterances, which may artificially retrieval. We conduct pronunciation-controlled experiments to test whether cross-lingual alignment arises from semantic rather than phonetic similarity.