24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context)
r/LocalLLaMA
•
Generative AI
Open Source AI
I got Qwen 3.6 35B-A3B and Gemma 4 26B-A4B running on a $200 secondhand machine (i7-6700 / GTX 1080 / 32 GB RAM) using llama.cpp (the TurboQuant/RotorQuant KV cache quantisation allows 128k context within the 8