Lost in Quantization Space: should i choose Qwen3.5:4B int8 or Qwen3.5:9B int4 ? none of them?

r/LocalLLaMA
Generative AI

I am a little bit lost, which one should i choose? What i have understood is that big models are always better even if they are quantized but that not true for all models. Also smaller model take less RAM (here 6.88 vs 7.56) so i can improve the context lenght. considering i have a limited network (i can't download both model this month -- limited data on my bill!) which one should i choose? is other quantization better? (GGFU, etc?) submitted by /u/Edereum [link] [comments]