Knowing What to Stress: A Discourse-Conditioned Text-to-Speech Benchmark

ArXi:2604.10580v1 Announce Type: new Spoken meaning often depends not only on what is said, but also on which word is emphasized. The same sentence can convey correction, contrast, or clarification depending on where emphasis falls. Although modern text-to-speech (TTS) systems generate expressive speech, it remains unclear whether they infer contextually appropriate stress from dis