Brief Ngram-Mod Test Results - R9700/Qwen3.6 27B

r/LocalLLaMA
Generative AI Open Source AI AI Research

Decided to try out the new --spec-type ngram-mod feature in llama.cpp using Qwen3.6 27B during an OpenCode bug chasing session. TLDR: Performance is variable, but so far it seems to provide a nice speed increase for working on the same code base. Here's a baseline llama-bench test: $: llama-bench-vulkan -m 'Qwen3.6-27B-UD-Q4_K_XL.gguf' WARNING: rad is not a conformant Vulkan implementation, testing use only.