AI RESEARCH

Compiler-First State Space Duality and Portable $O(1)$ Autoregressive Caching for Inference

arXiv CS.AI

ArXi:2603.09555v1 Announce Type: cross State-space model releases are typically coupled to fused CUDA and Triton kernels, inheriting a hard dependency on NVIDIA hardware. We show that Mamba-2's state space duality algorithm -- diagonal state structure, chunkable recurrence, and einsum-dominated compute with static control flow -- maps cleanly onto what XLA's fusion and tiling passes actually optimise, making custom kernels optional rather than required.