Follow-up: Qwen3 30B a3b at 7-8 t/s on a Raspberry Pi 5 8GB (source included)

r/LocalLLaMA
Generative AI

Disclaimer: everything here runs locally on Pi5, no API calls/no egpu etc, source/image available below. This is the follow-up to my post about a week ago. Since then I've added an SSD, the official active cooler, switched to a custom ik_llama.cpp build, and got prompt caching working. The results are. significantly better. The is running byteshape/Qwen3-30B-A3B-Instruct-2507-GGUF, specifically the Q3_K_S 2.66bpw quant. On a Pi 5 8GB with SSD, I'm getting 7-8 t/s at 16,384 context length. Huge thanks to u/PaMRxR for pointing me towards the ByteShape quants in the first place.