IK_LLAMA now supports Qwen3.5 MTP Support :O
r/LocalLLaMA
•
Generative AI
Compile, compile, compile! Will be testing shortly! EDIT: You will need a GGUF with the MTP layers preserved. The PR creator made some GGUFs of Q3.6 27B at Q8_0 here - EDIT 2: IT WORKS! Noticeable speed up (EXTRA 10 tok/s) with pipeline parallelism and MTP of draft-max 1. I went from 18-20 t/s to 30 t/s.