PSA: Two env vars that stop your model server from eating all your RAM and getting OOM-killed
r/LocalLLaMA
•
AI Tools
If you run Ollama, vLLM, TGI, or any custom model server that loads and unloads models, you've probably seen RSS creep up over hours until Linux kills the process. I t's not a Python leak. It's not PyTorch. It's glibc's heap allocator fragmenting and never returning pages to the OS. Fix: export MALLOC_MMAP_THRESHOLD_=65536 tsumexport MALLOC_TRIM_THRESHOLD_=65536 Set these before your process starts. That's it. We tested this on 13 diffusion models cycling continuously. Before: OOM at 52GB after 17 hours. After: stable at ~1.2GB indefinitely.