1-bit llms on device?!

r/LocalLLaMA
Generative AI Open Source AI AI Research AI Tools

Everyone's talking about the claude code stuff (rightfully so) but this paper came out today, and the claims are pretty wild: 1-bit 8b param model that fits in 1.15 gb of memory. competitive with llama3 8B and other full-precision 8B models on benchmarks runs at 440 tok/s on a 4090, 136 tok/s on an M4 Pro they got it running on an iat ~40 tok/s 4-5x energy efficient also it's up on hugging face! i haven't played around with it yet, but curious to know what people think about this one.