Priming: Hybrid State Space Models From Pre-trained Transformers

ArXi:2605.08301v1 Announce Type: cross Hybrid State-Space models combine Attention with recurrent State-Space Model (SSM) layers, balancing eidetic memory from Attention with compressed fading memory from SSMs. This yields smaller Key-Value caches and faster decoding than Transformers, along with a richer architectural design space. Exploring that design space at scale has so far required