Mismatch Aware Guidance for Robust Emotion Control in Auto-Regressive TTS Models

ArXi:2510.13293v2 Announce Type: replace While Text-to-Speech (TTS) systems can achieve fine-grained control over emotional expression via natural language prompts, a significant challenge emerges when the desired emotion (style prompt) conflicts with the semantic content of the text. This mismatch often results in unnatural-sounding speech, undermining the goal of achieving fine-grained emotional control.