llama.cpp -ngl 0 still shows some GPU usage?

r/LocalLLaMA
Generative AI AI Hardware Open Source AI

My llama.cpp is compiled with CUDA, OpenBLAS and AVX512. As I'm experimenting, I'm trying to have inference happen purely on the CPU for now. -ngl 0 seems to still make use of the GPU, as I see a spike in GPU processor and RAM usage (using nvtop) when loading the model via llama-cli How can one explain that? submitted by /u/sob727 [link] [comments]