Support Basis: Fast Attention Beyond Bounded Entries

ArXi:2510.01643v2 Announce Type: replace Large language models (LLMs) have nstrated remarkable performance across a wide range of tasks. However, the quadratic complexity of softmax attention remains a central bottleneck that limits their scalability. Alman and Song (NeurIPS 2023a; NeurIPS 2024a) proposed sub-quadratic time algorithms for attention inference and