[b]=[d]-[t]+[p]: Self-supervised Speech Models Discover Phonological Vector Arithmetic

ArXi:2602.18899v2 Announce Type: replace-cross Self-supervised speech models (S3Ms) are known to encode rich phonetic information, yet how this information is structured remains underexplored. We conduct a comprehensive study across 96 languages to analyze the underlying structure of S3M representations, with particular attention to phonological vectors. We first show that there exist linear directions within the model's representation space that correspond to phonological features.