I Ran Google's latest Gemma 4 Models on 48GB GPU. Here's What Actually Happened.

Dev.to AI
AI Hardware Open Source AI AI Research

This week Google dropped Gemma 4, and I wanted to test all four variants on my workstation. The specs looked interesting: two small edge models (2B and 4B), a MoE model that claims "26B total but only 4B active", and a dense 31B beast. The question was simple: which ones actually run on a single RTX A6000 with 48GB of VRAM? The internet had answers. Most said you'd need 4-bit quantization for the larger models. Some said the MoE wouldn't fit at all. I decided to test everything in full bfloat16 precision, no quantization, and measure what actually happens. I didn't do this manually.