Gemma 4 MoE hitting 120 TPS on Dual 3090s!

r/LocalLLaMA
Generative AI AI Hardware Open Source AI AI Research

Thought I'd share some benchmark numbers from my local setup. Hardware: Dual NVIDIA RTX 3090s Model: Gemma 4 (MoE architecture) Performance: ~120 Tokens Per Second The efficiency of this MoE implementation is unreal. Even with a heavy load, the throughput stays incredibly consistent. It's a massive upgrade for anyone running local LLMs for high-frequency tasks or complex agentic workflows. The speed allows for near-instantaneous reasoning, which is a total paradigm shift compared to older dense models. If you have the VRAM to spare, this is definitely the way to go.