Mixture of Layers with Hybrid Attention

ArXi:2605.09516v1 Announce Type: cross Standard Mixture-of-Experts (MoE) transformers route tokens to expert subnetworks within each layer, but the layer structure itself remains monolithic. We