I spent a weekend doing layer surgery on 6 different model architectures. There's a "danger zone" at 50% depth that kills every one of them.

r/LocalLLaMA
NLP AI Hardware

TL;DR: Duplicated transformer layers in 5 model architectures (Dense 32B, Hybrid 9B, MoE 30B, Dense 3B, cross-model transplant 7B). Found a universal "danger zone" at ~50-56% depth that kills models regardless of architecture. Optimal duplication depth varies by type. Cross-model layer transplant is a hard no - matching dimensions isn't enough. Minimum viable model: ~3B. All local on Apple Silicon (M3 Ultra, 512GB) via MLX. No cloud, no API, no