Variational Linear Attention: Stable Associative Memory for Long-Context Transformers

ArXi:2605.11196v1 Announce Type: new Linear attention reduces the quadratic cost of softmax attention to $\mathcal{O}(T)$, but its memory state grows as $\mathcal{O}(T)$ in Frobenius norm, causing progressive interference between d associations. We