Built LazyMoE — run 120B LLMs on 8GB RAM with no GPU using lazy expert loading + TurboQuant
r/LocalLLaMA
•
Generative AI
AI Hardware
I'm a master's student in Germany and I got obsessed with one question: can you run a model that's "too big" for your hardware? After weeks of experimenting I combined three techniques - lazy MoE expert loading, TurboQuant KV compression, and SSD streaming - into a working system. Here's what it looks like running on my Intel UHD 620 laptop with 8GB RAM and zero GPU. GitHub: Would love feedback from this community! submitted by /u/ReasonableRefuse4996 [link] [comments]