2 bit quants (maybe even 1 bit) not as bad as you'd think?
r/LocalLLaMA
•
Open Source AI
I was just reading that a comment on here (which I can't find) linked. A guy benchmarked 1-bit through 4-bit quants with a limited subset of MMLU-Pro, GPQA Diamond, LiveCodeBench, and Math-500. He tested 2 models at various Q1-Q4 quants: Qwen3.5 397B A17B and MiniMax-M2.5 229B A10B. For Qwen 397B, not only is IQ2 pretty close to Q4 at real benchmarks, but even Q1 is closer than you'd think. However for MiniMax it was a total catastrophe, and even Q4 is further away from BF16 than Qwen at Q1 is from its BF16.