AI RESEARCH
The False Resonance: A Critical Examination of Emotion Embedding Similarity for Speech Generation Evaluation
arXiv CS.CL
•
ArXi:2604.26347v1 Announce Type: cross Objective metrics for emotional expressiveness are vital for speech generation, particularly in expressive synthesis and voice conversion requiring emotional prosody transfer. To quantify this, the field widely relies on emotion similarity between reference and generated samples. This approach computes cosine similarity of embeddings from encoders like emotion2vec, assuming they capture affective cues despite linguistic and speaker variations. We challenge this assumption through controlled adversarial tasks and human alignment tests.