16x AMD MI50 32GB at 32 t/s (tg) & 2k t/s (pp) with Qwen3.5 397B (vllm-gfx906-mobydick)

r/LocalLLaMA
AI Hardware AI Tools

Qwen3.5 397B A17B GPTQ 4-bit @ 32 tok/s (output) and 2000 tok/s (input of 20k tok) on vllm-gfx906-mobydick 16 mi50 32gb setup Github link of vllm fork: Power draw: 550W (idle) / 2400W (peak inference) Goal: run Qwen3.5 397B A17B GPTQ 4-bit on most cost effective hardware like 16*MI50 at decent speed (token generation & prompt processing) Coming next: open source a future test setup of 32 AMD MI50 32GB for Kimi K2.5 Thinking and/or GLM-5 Credits: BIG thanks to the Global Open source Community! All setup details here: Feel free to ask any questions and/or share any comments.