RDNA2 flash attention isn’t enabled stock, I enabled it with this build and doubled my speed

r/LocalLLaMA
Generative AI Open Source AI

What's good everybody, I probably have the fastest possible setup on these AMD Radeon RDNA2 GPUs for one reason only. A custom binary that bypasses some assert statement causing a crash in today’s stock releases. This binary bypasses that assert and enables flash attention. Works for rocm lamma cpp build with qwen3.6 35B. tldr; vulkan tok/s 30. stock rocm tok/s: Doesnt run. This build: 70-80 tok/s try it yourself. If you guys try to run flash attention on rocm with this hardware with a stock llama cpp build, you will hit a wall.