Provably Shorter Scratchpads in Hybrid DeltaNet-Attention Decoders

ArXi:2605.16640v1 Announce Type: new We investigate the expressive power of hybrid recurrent-attention decoders, a class of architectures used in recent open-source language models such as Qwen3-Next and its successors. These models combine Gated Attention heads with recurrent Gated DeltaNet heads. Is there a formal advantage, in terms of model expressivity or efficiency, to such a hybrid architecture? We show that there is.