Per-Layer Embeddings: A simple explanation of the magic behind the small Gemma 4 models

r/LocalLLaMA
Generative AI Open Source AI

Many of you seem to have liked my recent post "A simple explanation of the key idea behind TurboQuant". Now I'm really not much of a blogger and I usually like to invest all my available time into developing Heretic, but there is another really cool new development happening with lots of confusion around it, so I decided to make another quick explainer post. You may have noticed that the brand-new Gemma 4 model family includes two small models: gemma-4-E2B and gemma-4-E4B. Yup, that's an "E", not an "A.