Raspberry Pi5 LLM performance

r/LocalLLaMA
Generative AI Open Source AI

Hey all, To preface: A while ago I asked if anyone had benchmarks for the performance of larger (30B/70B) models on a Raspi: there were none (or I didn't find them). This is just me sharing information/benchmarks for anyone who needs it or finds it interesting. I tested the following models: Qwen3.5 from 0.8B to 122B-A10B Gemma 3 12B Here is my setup and the llama-bench results for zero context and at a depth of 32k to see how much performance degrades. I'm going for quality over speed, so of.