Will llama.cpp multislot improve speed?
r/LocalLLaMA
•
Generative AI
Open Source AI
AI Tools
I've heard mostly bad opinions about multiple slots with llama.cpp (--parallel > 1). I guess comparing to vLLM it might be worse at this, but I recently tried vLLM on 4 slots and it indeed improved the overall speed significantly (150-170tps decode on one slot llama.cpp to 400tps with 4-slot vLLM, of