Cloudflare open-sources lossless LLM compression tool
r/LocalLLaMA
•
Generative AI
AI Hardware
Open Source AI
AI Research
Cloudflare released Unweight, a lossless compression system that reduces LLM size by 15-22% without sacrificing output accuracy. On Meta's Llama-3.1-8B, the tool saves roughly 3 GB of VRAM by compressing MLP weights on Nvidia H100 GPUs. Cloudflare open-sourced the GPU kernels on GitHub and published a technical paper, with plans to extend compression to attention weights. submitted by /u/Otis43 [link] [comments]