llama.cpp: Prefetching weights when offloading to CPU

Hello r/LocalLLaMA, I put up an experimental PR which prefetches weights when offloading to CPU. Long story short from results it helps dense + smaller MoE models. Give it a try if you are ram-rich and gpu-poor like me. submitted by /u/am17an [link] [comments]