ik_llama.cpp gives 26x faster prompt processing on Qwen 3.5 27B — real world numbers

r/LocalLLaMA
Generative AI Open Source AI

I've been running Qwen 3.5 27B Q4_K_M on a Blackwell RTX PRO 4000 (24GB) for agentic coding work and hit a wall with mainline llama.cpp. Switched to the ik_llama.cpp fork today and the difference is staggering. Posting real numbers in case it helps others.