PSA: Two env vars that stop your model server from eating all your RAM and getting OOM-killed

r/LocalLLaMA
AI Tools

If you run Ollama, vLLM, TGI, or any custom model server that loads and unloads models, you've probably seen RSS creep up over hours until Linux kills the process. I t's not a Python leak. It's not PyTorch. It's glibc's heap allocator fragmenting and never returning pages to the OS. Fix: export MALLOC_MMAP_THRESHOLD_=65536 tsumexport MALLOC_TRIM_THRESHOLD_=65536 Set these before your process starts. That's it. We tested this on 13 diffusion models cycling continuously. Before: OOM at 52GB after 17 hours. After: stable at ~1.2GB indefinitely.