llama.cpp and Qwen CPU Only
r/LocalLLaMA
•
Generative AI
Open Source AI
I have a Proliant DL360 Gen server with dual Xeon CPU E5-2620 v4 @ 2.10 with all memory banks loaded for a total of 128 GB Memory I'm trying to get llama.cpp to run with qwen CPU only on a VM for now on proxmox for testing and no matter what model I choose the cpu is pinned with even a basic "hello*. Qwen3.5-35b-a3b-q4_k_m I have tried so many times and any advice you can give me would be greatly appreciated! I'm even willing to accept "you're an idiot go play video games instead" It's basically unusable. It never responds fully and if I left it, it would probably take hours.