Has prompt processing taken a massive hit in llama.cpp for ROCm recently?
r/LocalLLaMA
•
Generative AI
Open Source AI
AI Research
ROCm Prefill Performance Drop on 7900XTX I've been looking to set up a dual 7900xtx system and recently put my Power Cooler Hellhound 7900xtx back into the machine to benchmark before PCIe splitting it with my Trio. Annoyingly, prompt processing on llama bench has dropped significantly while token generation increased. I'm running opensuse tumbleweed with ROCm packages and didn't even realise this was happening until checking my OpenWebUI chat logs against fresh llama bench results.