Qwen-3.6-27B, llamacpp, speculative decoding - appreciation post
r/LocalLLaMA
•
Open Source AI
First a little explanation about what is happening in the pictures. I did a small experiment with the aim of determining how much improvement using speculative decoding brings to the speed of the new Qwen (TL;DR big!). image shows my simple prompt at the beginning of the session. image shows time and token generation speed (13.60 t/s) for making the first version of the program. Also it shows my prompt asking for a new feature. image shows time and token generation speed for a second version of the program (25.53 t/s - you can notice an improvement.