Just some qwen3.5 benchmarks for an MI60 32gb VRAM GPU - From 4b to 122b at varying quants and various context depths (0, 5000, 20000, 100000) - Performs pretty well despite its age

r/LocalLLaMA
Generative AI AI Hardware Open Source AI

Llama.cpp ROCm Benchmarks - MI60 32GB VRAM Hardware: MI60 32GB VRAM, i9-14900K, 96GB DDR5-5600 Build: 43e1cbd6c Backend: ROCm, Flash Attention enabled Qwen 3.5 4B Q4_K (Medium) model size params backend ngl fa test t/s qwen35 4B Q4_K - Medium 2.70 GiB 4.21B ROCm 999 1 pp512 1232.35 ± 1.05 qwen35 4B Q4_K - Medium 2.70 GiB 4.21B ROCm 999 1 tg128 49.48 ± 0.03 qwen35 4B Q4_K - Medium 2.70 GiB 4.21B ROCm 999 1 pp512 @ d5000 1132.48 ± 2.11 qwen35 4B Q4_K - Medium 2.70 GiB 4.21B ROCm 999 1 tg128 @ d5000 48.47 ± 0.06 qwen35 4B Q4_K - Medium 2.70 GiB 4.21B ROCm 999 1 pp512 @ d20000 913.43 ± 1.37.