llama.cpp -ngl 0 still shows some GPU usage?
r/LocalLLaMA
•
Generative AI
AI Hardware
Open Source AI
My llama.cpp is compiled with CUDA, OpenBLAS and AVX512. As I'm experimenting, I'm trying to have inference happen purely on the CPU for now. -ngl 0 seems to still make use of the GPU, as I see a spike in GPU processor and RAM usage (using nvtop) when loading the model via llama-cli How can one explain that? submitted by /u/sob727 [link] [comments]