MISO-TTS . 8 Billion text2speech model released.

Model: TTS 8B is a text-to-speech model based on the Sesame CSM architecture. It generates Mimi audio codes from text and optional audio context, using a large Llama 3.2-style backbone and a smaller autoregressive audio decoder. Miso The model is designed for high-quality conversational speech generation and voice continuation from prompt audio. submitted by /u/AgeNo5351 [link] [comments]