Qwen3-TTS + qwen3.6-35B for a voice agent pipeline — 3 weeks of notes
r/LocalLLaMA
•
Generative AI
AI Tools
Saw the Qwen3-TTS thread this morning and it finally pushed me to write this up. Background: ive been building a local voice assistant for a client over the past 3 weeks. Voice-first interface on top of a RAG backend -- use case is an AI assistant where they need responses that feel conversational, not a typing test where you wait for the cursor to stop. TTS was the weak link. Tried Kokoro first, which is solid for narration but gets flat on short phrases like "got it" or "sure, one sec" -- the kind of back and forth that dominates voice interfaces.