Please anyone 👉 Can we offload the MOE layers to the GPU only and rest all goes in ram? See body text i have explained there.
r/LocalLLaMA
•
Generative AI
AI Hardware
Open Source AI
Basically, I’ve seen people using unified memory systems to run 120B models at an affordable cost.