Hybrid Associative Memories

ArXi:2603.22325v1 Announce Type: cross Recurrent neural networks (RNNs) and self-attention are both widely used sequence-mixing layers that maintain an internal memory. However, this memory is constructed using two orthogonal mechanisms: RNNs compress the entire past into a fixed-size state, whereas self-attention's state s every past time step growing its state (the KV cache) linearly with the sequence length. This results in orthogonal strengths and weaknesses.