AI RESEARCH

Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences

arXiv CS.LG

ArXi:2604.21100v1 Announce Type: new To address the increasing long-context compute limitations of softmax attention, several subquadratic recurrent operators have been developed. This work includes models such as Mamba-2, DeltaNet, Gated DeltaNet (GDN), and Kimi Delta Attention (KDA). As the space of recurrences grows, a parallel line of work has arisen to taxonomize them. One compelling view is the test-time regression (TTR) framework, which interprets recurrences as performing online least squares updates that learn a linear map from the keys to values.