Introducing oQ: data-driven mixed-precision quantization for Apple Silicon (mlx-lm compatible)

r/LocalLLaMA
Data Science AI Hardware AI Research

One of the things i found most frustrating while using mlx-lm was the quality of models quantized with a single uniform bit width. Sure, mlx-lm s various quantization options, but for most users, downloading a full-precision model and quantizing it yourself is a real barrier. (Even if someone tells you it's easy. The fear of the CLI is real.) So i started thinking. Quantization should not be exclusive to any particular inference server.