(Very) High-Quality Attention Coder-Next GGUFs

r/LocalLLaMA
Generative AI

I've been conducting a bunch of quantization experiments on Qwen3-Coder-Next while using it for downstream client programming and data processing tasks, and I'd like to share some of my experience and thoughts with the community, as well as some quants with (very) high-quality attention tensors. One of the first things I noticed while quantizing Coder-Next (indeed any 3.5 MoE models) is that the attention tensors are small. Like: 16-32MB per tensor per layer small.