Please anyone 👉 Can we offload the MOE layers to the GPU only and rest all goes in ram? See body text i have explained there.

r/LocalLLaMA •
Generative AI AI Hardware Open Source AI

Basically, I’ve seen people using unified memory systems to run 120B models at an affordable cost.