EVR-1 Maano: 3.93 GiB compression of Llama 3.1 8B. Under 6% repetition at 500 tokens where standard 3-4 bit quants hit 77-80%. Novel compression method, not standard quantisation.

Hey everyone, I'm Ibrahim from Evrmind, a UK start-up working on AI compression and edge compute. We've been working on a compression method that focuses on something most quant methods don't optimise for: whether the model actually produces coherent text beyond a few hundred tokens. We're announcing EVR-1 Maano-8b: our 3.93 GiB compression of Llama 3.1 8B. It's been on HuggingFace quietly for a few days but this is the first proper announcement. Download: Binaries: --- What is EVR-1? EVR-1 is not GPTQ, AWQ, or any standard GGUF quantisation type.