Gemma 4 has a systemic attention failure. Here's the proof.

r/LocalLLaMA
Generative AI Open Source AI AI Research

I've spent months building a diagnostic method for large language models. It catches what standard benchmarks miss - distributional collapse inside tensors, not just loss or perplexity. Gemma 4 26B A4B fails it. I analyzed Gemma 4 26B A4B (Q8_0) quant from Unsloth. Found 29 tensors with distribution drift. 21 of them are attention layers. Full log: 29 tensors with KL(Kullback-Leibler)-drift. 21 of them are attention layers (attn_k, attn_q, attn_