AI RESEARCH
New in llama.cpp: Model Management
Hugging Face Blog
•
Quick Start Features Examples Chat with a specific model List available models Manually load a model Unload a model to free VRAM Key Options Also available in the Web UI Join the Conversation llama.cpp server now ships with router mode, which lets you dynamically load, unload, and switch between multiple models without restarting.