Learning Multiple Utterance-Level Attribute Representations with a Unified Speech Encoder

ArXi:2603.08312v1 Announce Type: new Speech foundation models trained with self-supervised learning produce generic speech representations that a wide range of speech processing tasks. When further adapted with supervised learning, these models can achieve strong performance on specific downstream tasks. Recent post-