Qwen 3.6 27B Q8 on four Nvidia RTX A4000 (16GB each) with Llama.cpp and MTP enabled
r/LocalLLaMA
•
Generative AI
AI Hardware
Open Source AI
Qwen 3.6 27B Q8 on four Nvidia RTX A4000 (16GB each) with Llama.cpp and MTP enabled My setup is heterogenous, I originally acquired my server (Lenovo ThinkStation P3 Tower Gen 2) to run OpenShift/K8s clusters (because I work on that), and later on I started purchasing one by one those cards Nvidia RTX A4000 with 16GB of VRAM each, yes, old technology, but hear me out, 140W each card, one PCIe slot per card. I can accommodate four cards on my server.