Gradually increasing memory use - is there a memory leak in llama.cpp?
r/LocalLLaMA
•
Generative AI
Open Source AI
I've got a 128GB Strix Halo box. Yesterday I wanted to try out Step-3.5-flash. It's a model that barely fits in my system as is - I found a bartowski Q4_XS that's 105GB. With about 150K context it takes to about 108GB. That leaves about 20GB minus what linux is taking so like 17GB left. I ran opencode --continue so that I could try this model out in previous context. What I noticed was that with each query the memory (monitored in htop) bumped up but never completely went back to the previous use. So after a while it was up to 120GB.