Qwen3.5-397B is shockingly useful at Q2
r/LocalLLaMA
•
Generative AI
Open Source AI
Quick specs, this is a workstation that was morphed into something LocalLLaMa friendly over time: 3950x 96GB DDR4 (dual channel, running at 3000mhz) w6800 + Rx6800 (48GB of VRAM at ~512GB/s) most tests done with ~20k context; k-cache at q8_0 llama cpp main branch with ROCM The model used was the UD_IQ2_M weights from Unsloth which is ~122GB on disk. I have not had success with Q2 levels of quantization since Qwen3-235B - so I was assuming that this test would be a throwaway like all of my recent tests, but it turns out it's REALLY good and somewhat usable.