Bench 2xMI50 Qwen3.5 27b vs Gemma4 31B (vllm-gfx906-mobydick)
r/LocalLLaMA
•
Generative AI
Open Source AI
AI Tools
Inference engine used (vllm fork): Huggingface Quants used: QuantTrio/Qwen3.5-27B-AWQ vs cyankiwi/gemma-4-31B-it-AWQ-4bit Relevant commands to run: docker run -it --name vllm-gfx906-mobydick - ~/llm/models:/models --network host --device=/de/kfd --device=/de/dri --group-add video --group-add $(getent group render | cut -d: -f3) --ipc=host aiinfos/vllm-gfx906-mobydick:latest FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE" OMP_NUM_THREADS=4 VLLM_LOGGING_LEVEL=DEBUG vllm serve \ /models/gemma-4-31B-it-AWQ-4bit \ --served-model-name gemma-4-31B-it-AWQ-4bit \ --dtype float16 \ --max-model-len auto.