Gemma 4 has a systemic attention failure. Here's the proof.
r/LocalLLaMA
•
Generative AI
Open Source AI
AI Research
I've spent months building a diagnostic method for large language models. It catches what standard benchmarks miss - distributional collapse inside tensors, not just loss or perplexity. Gemma 4 26B A4B fails it. I analyzed Gemma 4 26B A4B (Q8_0) quant from Unsloth. Found 29 tensors with distribution drift. 21 of them are attention layers. Full log: 29 tensors with KL(Kullback-Leibler)-drift. 21 of them are attention layers (attn_k, attn_q, attn_