No, you don't need a "Datacenter" to run the big models (Deepseek, GLM, Kimi, etc) (just offload to CPU... and have patience)

r/LocalLLaMA
Open Source AI

I still see people here, now and then, saying things like "but you need a Datacenter to run (GLM, Deepseek, Kimi, etc)" or "if you run them you must be rich" and similar. Last time I read that was just a few days ago about (I think) qwen3.5-397B. I've been running them for many months now, and I still wonder why people keep saying that or are surprised that someone with 2x4090 or more/bigger, can run them. I have access to an: RTX 5000 ADA (slower than an RTX 5090), 128gb of RAM and models d in an NVME drive and I do run them at speeds of over 1 t/s.