You can do a lot with an old mobile GPU these days

Something I built. A conversational LLM chatbot, using speech-to-text and text-to-speech interfaces. The design goal was maximum conversational realism and engagement in a resource-constrained environment. In this, everything runs on a single RTX 3080 Mobile GPU with 16 GB VRAM total. Minimal system RAM usage and no Python dependencies. All components are built in C++ for speed. Components include: 1) Qwen3.5-9B UD-Q6_K_XL (GGUF)- LLM running on a (slightly) customized talk-llama.cpp example from GGML.org's whisper.cpp.