Google TurboQuant running Qwen Locally on MacAir

r/LocalLLaMA
Generative AI Open Source AI

Hi everyone, we just ran an experiment. We patched llama.cpp with Google’s new TurboQuant compression method and then ran Qwen 3.5-9B on a regular MacBook Air (M4, 16 GB) with 20000 tokens context. Previously, it was basically impossible to handle large context prompts on this device. But with the new algorithm, it now seems feasible. Imagine running OpenClaw on a regular device for free! Just a MacBook Air or Mac Mini, not even a Pro model the cheapest ones. It’s still a bit slow, but the newer chips are making it faster. link for MacOs app: atomic.chat - open source and free.