Using a Radeon 9060 XT 16 GB, the gemma4 24b a4b iq4 nl model achieves 25.9 t/s
r/LocalLLaMA
•
Generative AI
Open Source AI
AI Research
I'm testing running local LLMs on a gaming mini PC (AMD 7840HS, 32 GB RAM) paired with an eGPU (Radeon 9060XT with 16 GB VRAM). Since I'm not very familiar with using llama.cpp, I kept getting unsatisfactory results, but with the recent Gemma4 24B A4B IQ4 NL model I finally reached 25.9 t/s. I even connected it to OpenCode and tried asking questions from my codebase, and it seems usable at this level.