Bonsai models are pure hype: Bonsai-8B is MUCH dumber than Gemma-4-E2B

I'm using the fork for Bonsai, regular llama.cpp for Gemma. Without embedding parameters: Gemma 4 has 2.3B at 4.8 bpw (Q4_K_M) = 1104 MB Bonsai-8B has 6.95B at 1.125 bpw (Q1_0) = 782 MB (only 29% smaller) I could've gone with a smaller quant of Gemma 4, it's just conventional wisdom to not push small models beyond Q4_K_M. I might try their ternary model later, but I don't have much hope. submitted by /u/WeGoToMars7 [link] [comments]