Strix Halo ROCm + MTP Notes (May 2026)

r/LocalLLaMA
Generative AI AI Hardware Open Source AI

With the MTP merge into mainline llama.cpp I wanted to try out some other optimizations i could think of. Ended up tested backends, mtp, and bumping to ROCm nightlies. What's changed: ROCm 7.13 works on gfx1151 (7.2.2 could see the GPU but couldn't compile shaders) MTP merged to llama.cpp main yesterday (May 16) I ran 3 models x 2 backends x 3 prompt lengths + a full-context decode test The headline: ROCm drops 64% at full context, but MTP recovers most of it. Vulkan barely drops.