I benchmarked 30+ TTS engines for a real-time translator on Apple M4. Quantization made things SLOWER. Here's all the data.

r/LocalLLaMA
Generative AI AI Hardware Open Source AI

I'm building a real-time speech translator (STT → LLM translation → TTS) and spent a couple weeks benchmarking every TTS engine I could find - cloud and local. Running on MacBook Air M4, 24GB RAM. Some findings were. not what I expected. Sharing everything because I couldn't find this data anywhere when I started. The setup Pipeline: Deepgram Nova-3 (STT, ~300ms) → Groq Llama 3.3 70B (translation, ~200ms) → TTS → speaker The TTS component is the bottleneck. STT and LLM together take ~500ms. If TTS adds another second, the conversation feels like a walkie-talkie.