Used over a million tokens in three separate sessions to test Qwen 3.6 35b (new Multi-token Prediction version)

r/LocalLLaMA
Generative AI Open Source AI

In my opinion, MTP models are 100% game changer for local LLMs. In terms of speed, I was getting around 1.5x the tok/sec of previous tests. The project was a test - building a full iterative step-by-step pygame; a small mystery dungeon-style game. At first I set 100-200k context and raised it to 300k. This is at KV Q8_0 quant. Edit: I was wrong, I had mistakenly left it at q4_0. I will redo tests tomorrow with Q8. I use VSCodium and Roo.