ik_llama.cpp gives 26x faster prompt processing on Qwen 3.5 27B — real world numbers
r/LocalLLaMA
•
Generative AI
Open Source AI
I've been running Qwen 3.5 27B Q4_K_M on a Blackwell RTX PRO 4000 (24GB) for agentic coding work and hit a wall with mainline llama.cpp. Switched to the ik_llama.cpp fork today and the difference is staggering. Posting real numbers in case it helps others.