llama.cpp -ngl 0 still shows some GPU usage?

My llama.cpp is compiled with CUDA, OpenBLAS and AVX512. As I'm experimenting, I'm trying to have inference happen purely on the CPU for now. -ngl 0 seems to still make use of the GPU, as I see a spike in GPU processor and RAM usage (using nvtop) when loading the model via llama-cli How can one explain that? submitted by /u/sob727 [link] [comments]