Llama-server: is it bleeding to CPU/RAM?
r/LocalLLaMA
•
Generative AI
AI Hardware
Open Source AI
Is there an easy way to know if a model is using CPU/RAM (and not only GPU/VRAM)? (I think standard verbose output, which got shorter, says nothing about this, but I may be missing something) submitted by /u/jopereira [link] [comments]