Computer build using Intel Optane Persistent Memory - Can run 1 trillion parameter model at over 4 tokens/sec

r/LocalLLaMA
Generative AI

As the title states, my build is indeed able to run a 1 trillion parameter model (in this case Kimi K2.5) locally at ~4 tokens/second. I thought r/LocalLLaMA would be interested in the build due to that stat line, and also due to the inclusion of an unusual part, Intel Optane Persistent Memory, which I haven’t seen anyone use in an LLM inference build before. Optane PMem is a DIMM form factor memory unit that can function in a way that is somewhere between DRAM and an