Built a Voice Agents from Scratch GitHub tutorial: mic > Whisper > local LLM (GGUF) > Kokoro > speaker, fully local, no API keys
r/LocalLLaMA
•
Generative AI
Open Source AI
Been building this for a while and finally cleaned it up enough to share. voice-agents-from-scratch is a numbered, chapter-by-chapter repo that walks the full real-time pipeline: Microcapture Whisper for STT Local GGUF LLM (via llama.cpp) Kokoro for TTS Speaker output Everything streams - you don't wait for the full LLM response before TTS starts speaking. That's the part that makes it feel like a real conversation instead of a chatbot with a voice skin