Does llama-swap actually work with mlx_lm.server / MLX models on macOS?

r/LocalLLaMA
Generative AI Open Source AI

I’m trying to use llama-swap with an MLX model on a M2 Max instead of just llama-server. I got mlx_lm.server working directly with /v1/chat/completions, but I’m not sure whether llama-swap reliably s this setup. I have tried to edit the llama-swap config accordingly, however it didnt work. It looks like it is loading the model but nothing happens.