MKA: Memory-Keyed Attention for Efficient Long-Context Reasoning

ArXi:2603.20586v1 Announce Type: cross As long-context language modeling becomes increasingly important, the cost of maintaining and attending to large Key/Value (KV) caches grows rapidly, becoming a major bottleneck in both