AI RESEARCH

FG$^2$-GDN: Enhancing Long-Context Gated Delta Networks with Doubly Fine-Grained Control

arXiv CS.LG

ArXi:2604.19021v1 Announce Type: new Linear attention mechanisms have emerged as promising alternatives to softmax attention, offering linear-time complexity during inference. Recent advances such as Gated DeltaNet (GDN) and Kimi Delta Attention (KDA) have nstrated that the delta rule, an online gradient descent update, enables superior associative recall compared to simple additive updates.