(Sharing Experience) Qwen3.5-122B-A10B does not quantize well after Q4

r/LocalLLaMA
AI Research

Just a report of my own experiences: I've got 48GB of VRAM. I was excited that Qwen3.5-122B-A10B looked like a way to get Qwen3.5 27B's performance at 2-3x the inference speed with much lower memory needs for context. I had great experiences with Q4+ on 122B, but the heavy CPU offload meant I rarely beat 27B's TG speeds and significantly fell behind in PP speeds. I tried Q3_K_M with some CPU offload and UD_Q2_K_XL for 100% in-VRAM. With models > 100B total params I've had success in the past with this level of quantization so I figured it was worth a shot. Nope.