24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context)

r/LocalLLaMA
Generative AI Open Source AI

I got Qwen 3.6 35B-A3B and Gemma 4 26B-A4B running on a $200 secondhand machine (i7-6700 / GTX 1080 / 32 GB RAM) using llama.cpp (the TurboQuant/RotorQuant KV cache quantisation allows 128k context within the 8