llama.cpp: Prefetching weights when offloading to CPU
r/LocalLLaMA
•
Generative AI
AI Hardware
Open Source AI
Hello r/LocalLLaMA, I put up an experimental PR which prefetches weights when offloading to CPU. Long story short from results it helps dense + smaller MoE models. Give it a try if you are ram-rich and gpu-poor like me. submitted by /u/am17an [link] [comments]