Looking for recommendations for a small TTS model that can be fine tuned on a local language dataset.

Looking for recommendations for a small TTS model (<600M params) that can be fine tuned on a local language dataset. I have ~150 hours of very clean single speaker audio with accurate transcripts/pronunciation. Around 45000 text rows I’ve tried: • Orpheus: quality is good but model is too large • Qwen3 0.6B: terrible results • Qwen3 1.7B: Too slow Need something lightweight, easy to fine tune locally, and good for low resource/non English. Would love recommendations from people who’ve actually fine tuned smaller TTS models successfully. submitted by /u/ContentAmbassador953 [link] [comments.