Vulkan now faster on PP AND TG on AMD Hardware?

r/LocalLLaMA
Generative AI Open Source AI

Hey guys, i did some new llama-benches with newest llama.cpp updates and compared my vulkan and rocm build again. I am on Fedora 43 with ROCm 7.1.1 with an AMD Radeon Pro W7800 48GB and Radeon 7900 XTX 24GB In the past, ROCm was always faster on PP but compareable or 10% slower on TG.