MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant)
r/LocalLLaMA
•
Generative AI
Open Source AI
AI Research
TL;DR Results from the title are for single inference with 2 prompt of 1k and 15k tokens. So no MTP (as it’s slower for big prompt), no DFlash (working too but slower for big prompt), no quant used (full precision wanted) and the results are pretty good for a 2018 card. (Bench has been done with TP8, but the model not quantized fits also with TP2 and works pretty fast too, around 34 tps TG) IMO, fully usable with Claude Code or Hermes or any other agentic harness. I think there’s still room to go higher (by updating the software & hardware stacks, eg.