vLLM ROCm has been added to Lemonade as an experimental backend

r/LocalLLaMA
Generative AI Open Source AI AI Tools

VLLM has the ability to run.safetensors LLMs before they are converted to GGUF and represents a new engine to explore. I personally had never tried it out until u/krishna2910-amd/ u/mikkoph and u/sa1sr1 made it as easy as running llama.cpp in Lemonade: lemonade backends install vllm:rocm lemonade run Qwen3.5-0.8B-vLLM This is an experimental backend for us in the sense that the essentials are implemented, but there are known rough edges. We want the community's feedback to see where and how far we should take this.