Used over a million tokens in three separate sessions to test Qwen 3.6 35b (new Multi-token Prediction version)
r/LocalLLaMA
•
Generative AI
Open Source AI
In my opinion, MTP models are 100% game changer for local LLMs. In terms of speed, I was getting around 1.5x the tok/sec of previous tests. The project was a test - building a full iterative step-by-step pygame; a small mystery dungeon-style game. At first I set 100-200k context and raised it to 300k. This is at KV Q8_0 quant. Edit: I was wrong, I had mistakenly left it at q4_0. I will redo tests tomorrow with Q8. I use VSCodium and Roo.