AI RESEARCH
Knowing What to Stress: A Discourse-Conditioned Text-to-Speech Benchmark
arXiv CS.CL
•
ArXi:2604.10580v1 Announce Type: new Spoken meaning often depends not only on what is said, but also on which word is emphasized. The same sentence can convey correction, contrast, or clarification depending on where emphasis falls. Although modern text-to-speech (TTS) systems generate expressive speech, it remains unclear whether they infer contextually appropriate stress from dis