Per-Layer Embeddings: A simple explanation of the magic behind the small Gemma 4 models

Many of you seem to have liked my recent post "A simple explanation of the key idea behind TurboQuant". Now I'm really not much of a blogger and I usually like to invest all my available time into developing Heretic, but there is another really cool new development happening with lots of confusion around it, so I decided to make another quick explainer post. You may have noticed that the brand-new Gemma 4 model family includes two small models: gemma-4-E2B and gemma-4-E4B. Yup, that's an "E", not an "A.