Show HN: Glq LLM quantization using E8 lattice

Hacker News Show AI
Generative AI AI Hardware

I have with the help of AI create an open source method of E8 LLM code book quantization library called glq. I was interested in creating Glq as a PC gamer and devops, interested in both LLMs and AI. The current high RAM prices and LLM resource usage also inspired me to write glq. A question arises could you try and squeeze out a gaming GPU with limited VRAM size by using alternative LLM compression methods? Glq is effective compared to other LLM quantization algorithms at between 2-bits per weight up to 4 bits per weight.