Bonsai (PrismML's 1 bit version of Qwen3 8B 4B 1.7B) was not an aprils fools joke
r/LocalLLaMA
•
Generative AI
Open Source AI
I read the article yesterday: And watched the only 3 videos that had surfaced about these bonsai models. Seemed legit but still maybe an aprils fools joke. So today I woke up wanting to try them. I downloaded their 8B model, their llama.cpp fork, and tested it, and as far as I can see it's real: On my humble 4060, 107 t/s generation and >1114 t/s prompt processing, with a model that's evidently tiny. For comparison, on qwen 3.5 4B Q4 I had gotten 56 t/s using the same prompts.