1-bit llms on device?!
r/LocalLLaMA
•
Generative AI
Open Source AI
AI Research
AI Tools
Everyone's talking about the claude code stuff (rightfully so) but this paper came out today, and the claims are pretty wild: 1-bit 8b param model that fits in 1.15 gb of memory. competitive with llama3 8B and other full-precision 8B models on benchmarks runs at 440 tok/s on a 4090, 136 tok/s on an M4 Pro they got it running on an iat ~40 tok/s 4-5x energy efficient also it's up on hugging face! i haven't played around with it yet, but curious to know what people think about this one.